Psychometrika 


VOLUME XVIII-—1953 
JANUARY-DECEMBER 


Chairman:—Haro.Lp GULLIKSEN 


Editors:—M. W. RicHARDSON 


Pauut Horst 


R. L. ANDERSON 

J. B. CARROLL 

H. 8. Conrap 

L. J. CRONBACH 

K. E. CurEtTon 
ALLEN EDWARDS 
Max D. ENGELHART 
Henry E. GARRETT 
J. P. GuILForD 





Editorial Council 


Editorial Board 


HaroupD GULLIKSEN 
CHARLES M. HarsH 
Paut Horst 

AtsTon 8. HousEHOLDER 
LyLeE V. JONES 

TruMAN L. KELLEY 
ALBERT K. Kurtz 
IrvinG LorGE 

Quinn McNEMAR 





PUBLISHED QUARTERLY 


Managing Editor:— 


Dorotuy C. ADKINS 


Assistant Managing Editor:— 


B. J. WINER 


FREDERICK MOSTELLER 
GerorcE E. NicHoLson 
M. W. RicHARDSON 
Wo. STEPHENSON 
GopFREY THOMSON 

R. L. THORNDIKE 

L. L. THURSTONE 
LEDYARD TUCKER 

S. S. Wiiks 


By THE PSYCHOMETRIC SOCIETY 
AT 1407 SHERWOOD AVENUE 


RICHMOND 5, VIRGINIA 

















Psychometrika 





CONTENTS 


TEST RELIABILITY AND EFFECTIVE TEST LENGTH 
WiutuiaM H. ANGorr 


A LEAST-SQUARES SOLUTION FOR CASE IV OF THE LAW OF 
CAUREE EAVES SUSE ee ee 


W. A. GrBson 


AN ANALYTICAL SOLUTION FOR APPROXIMATING SIMPLE 
STRUCTURE IN FACTOR ANALYSIS ........ 


JoHNn B. CARROLL 


A NEW STATUS INDEX DERIVED FROM SOCIOMETRIC 
Ss 52 ay eee) oS yh ee ae See 


Leo Katz 


A FACTOR ANALYSIS OF INTRA-TASK PERFORMANCE ON 
TWO PS eC OWOLOE: ENSGIS: sok. sk ara a oe eee 
Epwin A. FLEISHMAN 


AN APPLICATION OF CONFIDENCE INTERVALS AND OF 
MAXIMUM LIKELIHOOD TO THE ESTIMATION OF 
BM Bees A ke ee 


FrepDErIc M. Lorp 


A REVISED ORTHOGONAL ROTATIONAL SOLUTION FOR 
THURSTONE’S PRIMARY MENTAL ABILITIES TEST 
EAM SIRI Yn ae ae eee eee oy RS ree cece 

WAYNE 8S. ZIMMERMAN 


GERHARD TINTNER, Heonomeirics . . 2. 2. 1 2 0 2 tt ew 
A Review by L. R. Kien 


23 


39 


57 








VOLUME EIGHTEEN MARCH 1953 NUMBER 1 








ERRATUM 


In Cureton, Edward E., Note on the scaling of ratings or rankings when 
the numbers per subject are unequal, Psychometrika, 1952, 17, 397-399, the 
second term of the numerator of Equation (2) should be —2(1/,SX;) rather 
than —MZSX. 











PSYCHOMETRIKA—VOL. 18, No. 1 
MARCH, 1953 


TEST RELIABILITY AND EFFECTIVE TEST LENGTH* 


Wiiuram H. ANGOFF 


EDUCATIONAL TESTING SERVICE 


Measures of effective test length are developed for speeded and power 
tests, which are independent of the number of items in the test or of the time 
required for administration. These measures are used in determining re- 
liability for (1) speeded and power tests, where a separately timed short 
parallel form is administered in addition to the full-length test; (2) power 
tests, where a subset of items is imbedded within the total test, parallel to 
the total test; and (3) power tests, where the subset of items is correlated 
with the complementary parallel subset in the test. 


In a previous article, Cronbach (1) has pointed out that the characteris- 
tics of mental measurement that make the estimation of error particularly 
difficult are two-fold. First is that the very act of measuring produces a 
noticeable change in the object measured. The task of responding to test 
items, particularly items of a cognitive nature, is in itself a learning task, and 
on a second administration there is a variable positive bias in test performance 
which is generally attributed to increased test wisdom or to more specific 
acquaintance with test content. Second is the fact that uncontrolled changes 
during the process of measurement, as well as changes associated with growth 
and senescence (or learning and forgetting), also produce a changed per- 
formance on the second administration. In both cases the changed perfor- 
ance can be interpreted, in the context of test reliability, only as variable 
error unassociated with the reliability of the measuring instrument, and 
operating to reduce the size of the reliability coefficient. 

In order to avoid attenuating the reliability coefficient with experimental 
error resulting from a second administration, methods have been developed 
for measuring reliability through the use of statistics taken from a single test. 
administration. In general, two such methods have been made available, 
the Kuder-Richardson formulas and the split-half method (with Spearman- 
Brown correction for half length)—as well as variants of these later developed. 

While these methods have yielded relatively satisfactory results for 
power tests, where sufficient time is given for all examinees to attempt all 
items, they have been considered totally inadequate for speeded tests. 


*The writer gratefully acknowledges the assistance of Dr. Ledyard R Tucker in the 
formulation of some of the concepts presented in this paper. He wishes also to express his 
appreciation for the helpful comments of Dr. Harold Gulliksen and Dr. Frederic M. Lord 
in their review of the manuscript. 


1 





2 PSYCHOMETRIKA 


Guilford (3, 486) and Thorndike (7, 582), for example, have pointed out that 
an odd-even split of items in a purely speeded test would yield a correlation 
between test halves of unity, regardless of the reliability of the test. On the 
other hand, assuming that all examinees complete the first half of the test, a 
split of the first half against the second half would yield an indeterminate 
correlation, since the variability on the first half would be zero. In general, 
then, the computed reliability will be largely a function of the manner in 
which the test split has been made, and will tend not to reflect the actual 
reliability of the test in terms of the theoretical parallel-forms coefficient. 

The Kuder-Richardson formulas are similarly inadequate for speeded 
tests. In speeded tests, where discrimination among examinees is made in 
terms of the differential number of items answered in a specified length of 
time, the inter-item covariances within a test are higher than they would be 
between parallel items on different forms of the test (7, 588). Since the 
reliability of the total test is a direct function of the reliabilities of the indi- 
vidual items (measured in this case in terms of inter-item correlations), the 
value of the reliability coefficient for the total test is thereby inflated. 

In view of the inadequacies of the reliability formulas, it appears that 
there are at present no single-administration techniques for estimating the 
reliability of speeded tests. Guttman (5), in fact, maintains that reliability 
in general cannot be estimated from a single trial, and that all single-trial 
reliabilities are, in effect, lower bounds. Cronbach and Warrington (2) and 
Gulliksen (4) have developed lower-bound estimates of the reliability of 
speeded tests, but precise single-administration techniques are not available. 
Guilford (3, 486, 487) suggests the application of a split-half technique 
in which both test halves are given in separately timed administrations in 
immediate succession. (One of the difficulties of this method that ‘irst comes 
to mind is the matter of deciding on the appropriate time limits for the 
separate halves which would match the degree of speededness of the total test 
given in one administration.) The only alternative method is to devise an 
additional full-length parallel speeded test and to obtain an equivalent-form 
correlation. This procedure raises at least two problems: The first, dis- 
cussed by Cronbach and Warrington, is the expense of constructing an 
alternate form solely for the purpose of providing reliability coefficients for 
a published test. The second problem relates to the questionable assump- 
tion that the parallel test is truly of the same effective length as the original 
test, merely because the numbers of items and the scheduled test times for 
the two tests are equal. In the case of speeded tests, variations in the amounts 
of time necessary to answer the items will cause substantial variations in the 
effective lengths of the tests. 

The purpose of the present paper is first to suggest that the problem of 
economy in obtaining the reliability of a speeded test may be at least partly 
solved by administering a short parallel form in addition to the regular test. 








— > 


a el oe ll ee) ee 





WILLIAM H. ANGOFF 3 


Second, the purpose is to provide a measure of functional or effective test 
length and incorporate that measure into the reliability coefficient. In 
a later section of this paper, corresponding methods will be discussed for 
computing the reliability of unspeeded tests where the short parallel form is 
imbedded within the regular test, and only one administration of the test 
is given. In the latter case, the reliability is probably better interpreted 
as a lower-bound reliability or an index of internal consistency. 

Case I. The determination of the reliability, r,, , of test ¢ from the 
correlation between test ¢ and test 7, a separately timed test, parallel to test 
t. While the more stringent case of speeded tests is treated, the method 
applies equally well to the case of unspeeded tests. 

We shall consider that a short test, 7, has been devised to parallel in 
function, level and spread of item difficulty, and items per unit of time a 
long test, ¢, which is speeded and for which a test reliability is to be deter- 
mined. In connection with the requirement of parallelism it is assumed that 
the tests have been equated for spuriousness, in the sense that Cronbach and 
Warrington (2, 169) have used the term. In their paper they point out 
that in an unspeeded test “especial difficulty on one of the items neither 
increases nor decreases the person’s probable standing on the remainder. But 
in a timed test, the person who gets stuck on one item may never reach the 
remainder of the items. It is this interdependence of items that introduces 
spuriousness.”’ Finally, it is considered that, contained in test t, there are n 
tests, 7, of effective length 7, all parallel to test 7. The correlation between 
tests 7 and ¢ is given by: 


Vie — Vey (zqtzbt...+zj+rkt...4+2n) 
n 
p De Ci; TY 
a, TS ee nC; (1) 
ange pe ’ 
O;O; 0;0;% 


where bas ; is the sum of the covariances r;;o;0; between test 7 and each of 
the parallel forms j of effective length 7 contained in test ¢. The value of n 
is the number of tests of effective length 7 contained in ¢, or the ratio of effec- 
tive lengths, ¢ to 7. 

The variance of test ¢ may be written: 


B= Lit DL LCpaenitnm—-Y)Cn, GH. (2) 
7=1 7=1 k=l 


In general, throughout the development of the formulations to follow, 
it will be assumed that (a) average covariances involving parallel tests of 
equivalent length are equal, so that C;; = C;, ; (b) any variance (or covari- 
ance) is equal to the average of all other variances (or covariances) involving 
parallel tests of equivalent length, so that of = o; , C;; = Cj; , and that 
Ci = Ci : 














4 PSYCHOMETRIKA 


Then, solving (1) for ra , substituting in (2) for Cy , replacing o by 
o; , and solving (2) for n, we have 


— Oe + ies) 
wis oo; + Pi0:) @) 


Equation (3) yields a value of n which is determined not from the 
arbitrary ratio of the numbers of items in the two tests or from the ratio of 
time lengths, but from the data yielded by the test experiment itself. Par- 
ticularly in speeded tests, neither the ratio of time lengths nor the ratio of 
numbers of items is suitable for estimating effective n. For one thing, as- 
suming that no one completes the test, and that speed is the primary source 
of test variance, the distribution of test scores is highly sensitive to changes 
in total time limit as well as to changes in spuriousness (see above), but 
not at all sensitive to the addition of test items. Secondly, extraneous 
factors such as the period of warm-up at the beginning of the test would 
operate to reduce the effective test time in the short test to a greater extent, 
proportionally, than in the long test. Consequently, it would seem appropri- 
ate that a measure of effective test length be used in estimating reliability, 
such as that expressed in equation (3) rather than the ratio of the numbers 
of test items or the ratio of test times. 

It may be of some interest to note that if r;, = 1.00, then n = o;,/o; , 
and that if r;, = .00, then n = o7/o; . Consequently, we can establish that 
ai/o, > n> a,/c,. It may also be observed from equation (3) that if the 
standard deviations of the tests are equal, then n = 1, and the tests are of 
equivalent length. 

It will be convenient at this point to state the reliability of test ¢ in 
terms of its correlation with test 7. Consider that test ¢ is correlated with a 
parallel test of equivalent length, composed of n tests of length 7: 


> Co n Cu 








Trt - Vee (zqtzoht..0t+zit+s..4z0) 2 = 2 ’ (4) 
1 C1 
where >-C;, = >-r,.c,0, . In accordance with the assumption of equal 
average covariances stated above, C;, = C;,. Thus, 
NC, NAO; 
me Se (5) 
Ot Or 
Substituting (3) in (5), we find 
(a, + riot t 
i, ee ae me (6) 











WILLIAM H. ANGOFF 5 


Equation (6) gives a method for determining the reliability of a test 
from its correlation with a parallel test, not necessarily of the same length. 
In examining the practicability of equation (6) it is observed that this is 
the formula to be used when estimating the reliability of a test from the 
correlation of any two parallel tests, even those presumed to be of the same 
effective length. If the standard deviations of the two tests are equal, and 
the tests are of equivalent length, then the reliability, r,, , is identical to 
r;, , the correlation between the two tests. However, if the standard devia- 
tions are unequal, and the tests are incorrectly presumed to be of equivalent 
length, then the correlation between the two tests will be different from the 
reliability of either test. In effect, the value of n must be considered and 
incorporated into the determination of reliability, and it would be necessary 
to decide beforehand whether the reliability of test ¢ is to be determined, or 
the reliability of test 7. If the standard deviations of the two tests are differ- 
ent, then different results will be found. 

Particular emphasis should be given to the basic assumption inherent 
in the present formulations: tests i and t must be parallel tests. If that as- 
sumption is violated in a choice of a non-parallel test 7, then the reliability 
of test ¢ may well be grossly underestimated. 

Finally, it may be observed that if (1) is substituted in (5), 

a mCi; , (7) 


Tee 2 
or 


If it is assumed that C;; = r,,07 , then 


oVrin i (8) 


= 
0: Vrii 


The value of is seen to be the ratio of the standard deviations of true scores 
in the (mutually exclusive) long and short tests. 

Case II. The determination of the reliability, r,, , of an unspeeded test, 
t, from the correlation between test ¢ and a subset of items, test j, included 
in test ¢. 

If test ¢ is not speeded and the principal source of test variance lies in 
the differential abilities of the examinees to respond correctly to test items, 
then a single test administration is capable of yielding an internal consistency 
reliability coefficient. Let us consider that there exists and can be chosen 
a subset of items, test j, contained in test ¢, that parallel the parent test in 
function and difficulty. Further, consider that there are n such parallel 
subtests contained in test ¢, all mutually exclusive. Then, making use of 
the same assumptions of equivalence as were made for equation (1) above,* 


*Except for equating the characteristic of spuriousness, See statement of Cronbach 
and Warrington quoted above. 








6 PSYCHOMETRIKA 


we can state the correlation between test j and its parent test, ¢, as follows: 


Vit = V23leets004e74eR+s<.420) 
g; + z Ci 
———- - ar ®) 
O;0; 
a oo +n — 1) Cx : (9) 
Ojo; 


We have observed that 
o=notnn—1Cr. GFR (2) 


If we now solve (9) for C;, , substitute in (2), and replace o; by its 
equivalent, o; , 
o 
n=-—t.- (10) 
1510; 
Now solving equation (2) for C;, and substituting in (7) for its equivalent, 
Ci; ’ 
no, — na) 
ry, = et 7 
(n — 1)o, ’ o 
which is exactly parallel to Kuder and Richardson’s formula (20). Finally, 
if o; is substituted for its equivalent, o; , and the value found in (10) is sub- 
stituted for n, 
OT jt — G; 


Tee = . 
1; (0, — T410;) 


(12) 


Equation (12) gives the reliability of an unspeeded test, ¢, obtained from 
the correlation between ¢ and its parallel subtest 7, and their standard devia- 
tions. 

Case III. The determination of the reliability, r,, , of an unspeeded test, 
t, from the correlation between its complementary parallel parts, h and J, 
and their standard deviations. 

It will be observed that equation (12) may be written as follows: 


i, = oom (13) 
Ifh =t—j, 


— :) eo oe (14) 


and 
nie 


= (oy 4. r,;0;)(a, + Trion) ; 


Vee 


(15) 








WILLIAM H. ANGOFF t 


The value of o; may be taken from the following expression, 
a, = 0, +0; + 2ry;040; , 
and substituted in (15) to yield 


an Trion + o; + 271.) Fn9 i) 
(o, > 1503) (0; i TriOn) : 





Tie (16) 
so that all values used are taken from the subtest scores. 

It may be noted from (16) that if the test split has been made in such 
a way as to produce parallel tests of equal effective length—that is, when h 
and 7 are equivalent tests and o, = o; , then 


es 273 


ae 


Tee 


which is the familiar Spearman-Brown correction for half length. 

To complete the analogy between 7-exclusive-of-t and j-contained-in-t: 
It is clear that the counterpart, for the “contained” case, of equation (8) 
(where n is expressed as the ratio of the standard deviations of true scores, 
t to 7), is directly analogous to the “exclusive” case. If it is assumed that 
1;;0; = 1T;;0; , then equation (8) may be restated: 


oj Vr 

In the case of power tests, test length has usually been measured in 
terms of the number of items. However, if the items near the beginning of 
the test are correctly answered by everyone in the group, or if the items near 
the end of the test are correctly answered by no one in the group, then the 
test is obviously not effectively of the length arbitrarily assumed. Some 
measure of test length should be used such as that implied in (17), which 
takes into account the number of items effectively discriminating among 
the members of the tested group. 

It may be argued that if the short test is ideally chosen with respect to 
level and range of item difficulty, then the value of n will remain constant, 
irrespective of the performance of the particular group. However, since the 
ideal is not achieved in practice, it is necessary to determine the value of n 
in the particular instance. In effect, the direct determination of effective n 
allows a greater degree of laxity in the choice of items for the subtest, but 
does become a necessary adjunct to the determination of reliability. Particu- 
larly important is the fact that the choice of items for the subtest need not 
be restricted by any arbitrary prior decision regarding its length, since its 
length would be determined in conjunction with the determination of the 
reliability. With that restriction removed, greater freedom can be devoted 


(17) 








8 PSYCHOMETRIKA 


to making the subtest truly parallel in function and distribution of item 
difficulty. 

It may be well to repeat that throughout these formulations it is assumed 
that the subtest of items, j, is parallel to the long test, t. If this assumption 
is not met in practice, then the reliability of test ¢ will be underestimated. 

It will be of some interest to examine the relationship among 7, , i; , 
and r;, , Where test 7 is exclusive of ¢, and to compare that relationship with 
that found among r,, , 7;; , and r;, , where j is contained in ¢. If we consider 
the “exclusive” case first, we note in equation (1) that 

te eu : (1) 
Tio: 


2 


If we assume that C., = r;,0; , and substitute in (1) the value of n found in 
(8), then 
se = 74s) tt © (18) 


Equation (18) has otherwise been obtained by stating the correlation between 
parallel forms of the same test, adjusted for attenuation due to unreliability 
and considering that the correlation between true scores on parallel tests is 


equal to unity. 
Considering the ‘‘contained’’ case, we note in equation (10) that 


n=. (10) 


1510; 
If the value of n found in equation (17) is substituted in (10), it is found that 


f=. (19) 


tt 


3 


It is observed in comparing (18) with (19) that the relationship among 
Tit, Tix , aNd 7,, in the “exclusive’’ case is quite different from the relationship 
among 7;, , 7;; , and 7,, in the “contained” case. When the short test is 
exclusive of ¢, then r;, is equal to the product of the reliabilities of the short 
and long tests (equation 18); when the short test is contained in ¢, then 
ri, is equal to the ratio of the reliabilities of the short and long tests (equation 
19). 

It can be shown that equations (18) and (19) are not inconsistent, if 
account is taken of the spuriousness in (19). Since tests 7 and 7 are parallel 
and of equal length, assume that 7;; = 7;; . Solving (18) for r,, and sub- 
stituting in (19), and also substituting r;,; for 7;; , 

Tie _ Tis 
on ra (20) 
Now substituting in (20) the values of ri, and rj, found respectively in (1) 
and (10), and assuming that C,;; = 7,;;0;¢; , equation (20) results in an identity. 








WILLIAM H. ANGOFF 9 


It may be of interest to examine further the relationship between 7,, 
and r;,. If tests 7 and j are parallel and of equivalent Jength, as has been 
assumed throughout this development, it would appear obvious that 7;, > 
7;, , because of the spuriousness in 7;,. The degree of this spuriousness can be 
shown in that r;, = V7r;;ru. + &, where 








y = wil =~Figs t= Fi ; 
ot Vn + n(n — 1)r;; 
Assuming that the reliabilities r;; and 7;; are equal, then it is clear that 
Yj: = Tit , and that the relationship 7;, = Vr; 71. cannot hold unless 7;; 


= 1.00. 
Consider that the correlation between true scores on j and ¢ (j included 


in ¢) is unity. Then 
ete 1.00 = T(xj—ej) (zt—et) 
pe? ES Do ries = PETS + de Cite 
= , (21) 
No;o; Vr iT 
where, for example, j.. and e; are the true and error components of x; , such 


that 7; = jo +e; . 
Examining each term separately, we find 























De tie, Dd ale ter t-+> fe; tees $e) — < - Tae 
es N es Wi. 
te; Dt toate tate tae, DL xe _ 2 
N 7% N oe N =" Wezty 
ee; Vata t-:-+e+--- +e) De _ 2 
N N a N wee i) 
Other terms go to zero. 
Then 
2 
151:0j;0e —~ Cy; 
1.00 = “+ — , 22 
0301. V 7; i 11 ( 
and 


o;(1 — 7;;) 
i Pele Oe 5 


Cr 


and finally, 








gems 1 — rj; 
| iiMee is . 23 
: — Vn + n(n — Ir;; sa 








10 PSYCHOMETRIKA 


It has been observed that equation (18) holds only when the short and 
long tests are mutually exclusive, and that (19) is applicable when the short 
test is contained in the long test. The use of (18) when (19) is justified may 
well lead to questionable results. For example, it appears that in their Case 
II development, Kuder and Richardson (6) made inappropriate use of equa- 
tion (18), since they were dealing with the correlation between an item and 
the test in which it was contained.* If their equation (3) is restated in the 
present notation (and renumbered a), 


— Spat DS vriipa 
LS? ie as ; 2 — ’ (a) 


Cr 





and r,, 7}, is substituted for 7;; instead of rj,/r,, , Which they used, then 


— Divgt Draipg 
le = 2 F (b) 


On 





Solving for r,, , we find 


-> Pq 
Vee oy aCe (c) 


— Sripg 
1 
instead of the equation (8) presented in their article, 


— Ling x Ete 4 es i= uray 


Qe? ‘ 





ee 


A second instance in the Kuder-Richardson article of the inappropriate 
use of equation (18) appears in their Case III development, in the step from 
their equation (9) to equation (10). They assumed that 7;; 7,, = 7, , which 
does not hold if item j is contained in test ¢t. If their equation (9) is restated 
in the present notation (and renumbered d), 


7 ( > Viti) 
hj eh (d) 


ot 


and r,, rj, is correctly substituted for r;; , then 7,, disappears entirely, and 
ei ” “( : Von), (e) 


*It should be pointed out that the relationship between rj; , rr, and r; which was 
used in the Kuder-Richardson article is not basic to the derivation of their formulas (20) 
and (21). There is no implication in the present paper that those formulas need revision. 








WILLIAM H. ANGOFF 11 


instead of their equation (10), 


(Vpn) 
Ce _ is 
t 





Actually, the amount of error incurred when (18) is used instead of 
(19) is quite small, if n is large. In general, this is true in the Kuder-Richard- 
son developments described above, where a single item is correlated with 
the entire test. If, for example, n is taken as 100 and 7,; (or 7;;) is taken 
as .10, then r;, = .303 and r;, = .330, a difference of only .027 between the 
spurious and the non-spurious correlations. Similarly, the error in r,, is 
small. If the spurious value, 7;, , is used in (18), then r,, is found to be 
.930 instead of .917 if (19) is (properly) used. If, on the other hand, test ¢ 
is effectively not much longer than test 7, then the difference in correlations 
can be quite appreciable. Suppose, for example, n = 4 and 7;; (or 7r;;) = 
.50. Then 7r;, = .633 and r;, = .791. If the spurious correlation, 7;, , is 
improperly applied in equation (18), then 7,, will appear to be much higher 
than it should—.90 instead of .80. 

Tables 1 and 2 describe the results of some computations which serve to 
illustrate the usefulness of the estimates made in equations (6) and (12). 
Table 1 relates to the reliability of speeded tests, and Table 2 to power tests. 
In Table 1, the results are presented for four replications (with variations) 
of an experiment in which four randomly chosen groups of male college-level 
students were administered separately-timed speeded tests in mathematics. 
Test 1 contained 16 items in free-answer and multiple-choice form, alter- 
nately presented. Tests 2 and 3 were each composed of two tests—37 free- 
answer items and 37 multiple-choice items—each separately timed. Thus, 
scores on Test 1 were derived from 16 items, while scores on Tests 2 and 3 
were each derived from 74 items; all three tests contained both free-answer 
and multiple-choice items.* 

For each of the four groups of students, Table 1 gives three estimates of 
reliability of Test 2, and three for Test 3. The value 7.3; may be considered, 
as it has in the past, to be the reliability of either test, with the (unwarranted) 
presumption that the two tests are of equivalent length. In addition, 7. is 
estimated from its correlation with the shorter test, 1, adjusted for test 
length, and also from its correlation with Test 3, also adjusted for differences 
in test length. Similarly, the reliability of Test 3 is estimated from its corre- 
lation with Test 1 and also from its correlation with Test 2. It is seen that 
the estimates of reliability for Test 2, as derived from equation (6), are close, 
as are the estimates of reliability for Test 3. The largest difference, .027, is 
that between the two estimates for 73; (Group C). Even this difference may 


*The author wishes to express his appreciation to Dr. L. B. Plumlee for providing 
the data summarized in Table 1. 









































G0'F £0'°F 966° L¥6° 626° 108 2% Tg9°9 00¢ oO €-NA 
LIP 80°F or6" OF6" 816° OTT 1% Tg9°¢ 00s ad o-NA 
SO'F cO'F 216° G16" $96" SES ZE 68'S 00¢ V O-NA 
SOF ¥6'€ LL6° €26° 296° GES SE €8¢'s 00¢ V O-NA 
LIF 8E'F 216° FL6° 896° GES ‘SE 9FL°2 00¢ V O-NA 
LI L0°¥ LL6° 896° $S6° GES ZE Z8€°8 00s V O-NA 
£4 £4 (02) U-M (ZT) “bY 249807, [80], £ 4ysa4qng 
ty N dnory 489], 
SUle}] JO SIOqUINNT syysueT oaooyq wy suolyelAog piepueyg 
jo SOI}eyYy JO SOIPRY 
< — eo sae 
S II 888 :s}say, popsodsuy) 10J syyZue'T 4say, VAY JO soley pus sarzIqeijay 
: @ WIAVL 
e) 
ee 
1S) 
a = 
86° c6° OL 9L°€ £18" 6¢8° 648° 268° 928° CEL" c8L° $42 OL OLL'OL €92°E GET da 
€I'T alt 66°€ Lvs FI6° 288° 06° 288° 606° 692° 912° 896'0I LF8'6 €22'€  6ET O 
00°T 1o'T L0°¥ 80°F 988° S88" 988° 828° 988° T92,° 8rL° 99T'OI SOL'OL 206°%~ 6&I ad 
L0°T L0°T 18° Lg°€ 206° 283° 968° 818° 668° GLZ" 292° FFO IT 698° OI Sze'e LET V 
L:2/Le Ze Le LZ go ely coy tly &ey ely oly €s97y, 2389, 1 3SeL 
—— N dnory 
syysuey satya JO soryery WOdy “4siy tu WOdj “48a 24 SUOT}B[AIIORD SUOI}PBIAVG, pavpuvyg 




















J 98D :s}saJZ, papsodg Joy Sy Bua] 4SoT, SAI] JO sorey pun SONyiqUyoyy 


T WIdvVL 


N 
— 








WILLIAM H. ANGOFF 13 


be accounted for in part by the lack of complete parallelism between Tests 
2 and 3 and Test 1. (It is recalled that Tests 2 and 3 had two time limits, 
while Test 1 had only one.) 

The right-hand side of Table 1 describes the ratios of effective test 
lengths, as determined from equation (8). In general, Tests 2 and 3 appear 
to be 3.5 to 4.1 times as long as Test 1, in spite of the fact that they contain 
about 4.6 times as many items as Test 1. It is also observed that in Groups A 
and C Test 3 appears to be effectively longer than Test 2, which accounts for 
the slightly higher estimates of reliability for Test 3. In Group B, the lengths 
are seen to be about equal, while in Group D, Test 3 is the shorter of the 
two. Finally, in the last two columns it is observed that the two independent 
estimates of the ratio of effective length, Test 3 to Test 2, are extremely close. 

Table 2 relates to the reliability of unspeeded tests. Three forms of a 
150-item verbal and numerical reasoning test were administered—in the 
case of VN —0, to an extremely heterogeneous group of over 2000 examinees, 
and in the case of VN—2 and VN—3, to larger, but more homogeneous 
groups of examinees. In the case of each test, a sample of 500 cases was 
drawn at random from the parent group of examinees. For Test VN—O, 
four mutually exclusive subtests were chosen, each composed of slightly less 
than one-fourth of the total number of items in the parent test. Scores on 
the subtests were then correlated with scores on the total test, yielding the 
values in the column headed 7;,. For Tests VN—2 and VN —3, one sub- 
test was chosen for each and correlated with its parent test. Estimates of 
reliability were then made in each instance in accordance with equation (12) 
and also in accordance with Kuder and Richardson’s formula (20). It is 
seen that in each case the two estimates of reliability are close, differing at 
most by .009. It is also observed that the ratios of effective lengths determined 
from equation (10) are similar to the ratios of the numbers of items in the 
subtest and parent test. 

In summary, a number of equivalent methods become available for the 
calculation of the reliability of a whole test without making any arbitrary 
assumptions of the relative lengths of the subtest and total test. The formulas 
presented here in the case of both speeded and power tests make use only 
of the general assumptions that the short test or subtest is truly a represen- 
tative and parallel miniature of the long test, in terms of item content and 
level and spread of item difficulty. The amount by which the correlation 
between the short (or subtest) and long test is to be ‘stepped up” is de- 
termined and incorporated in the equations. 

In practice, equation (6) should be used with speeded tests where the 
administration of an additional test is required to determine reliability. The 
equivalent alternative is to use equation (5) in conjunction with (3). In the 
case of power tests, three general procedures are possible: 

(1) A double administration with mutually exclusive but parallel tests, 
in which reliability is to be determined from the correlation between the 








14 PSYCHOMETRIKA 


parallel tests. In this case as in the case of the speeded tests, equation (6) 
would be used, or (5) in conjunction with (3). 

(2) A single administration in which reliability is to be determined 
by breaking off a subtest of items parallel to the total test, and correlating 
the subtest score with the total test score. In this case equation (12) seems 
to be practicable, since it involves only one correlation and its by-product 
standard deviations. The equivalent alternative to (12) is equation (14), 
which is concise algebraically, but considerably more laborious. 

(3) A single administration in which reliability is to be determined 
by splitting the total test into two parallel subtests (not necessarily of equal 
length), and correlating the subtests. In this case, equations (15) and (16) 
are appropriate. Formula (14) should also be mentioned here, and is ap- 
propriate, except for the reservation noted in the preceding paragraph. 


REFERENCES 
1. Cronbach, L. J. Test “reliability”: its meaning and determination. Psychometrika, 
1947, 12, 1-16. 
2. Cronbach, L. J., and Warrington, W.G. Time-limit tests: estimating their reliability 
and degree of speeding. Psychometrika, 1951, 16, 167-188. 
3. Guilford, J. P. Fundamental statistics in psychology and education (2nd ed.). New 
York: McGraw-Hill Book Co., 1950. 
Gulliksen, H. The reliability of speeded tests. Psychometrika, 1950, 15, 259-269. 
Guttman, L. A basis for estimating test-retest reliability. Psychometrika, 1945, 10, 
255-282. 
6. Kuder, G. F., and Richardson, M. W. The theory and estimation of test reliability. 
Psychometrika, 1937, 2, 151-160. 
Thorndike, R. L. Reliability. In Lindquist, E. F., Educational measurement. 
Washington, D. C.: American Council on Education, 1951. 


“J 


Manuscript received 5/19/52 


Revised manuscript received 6/28/52 








PSYCHOMETRIKA—VOL. 18, No. 1 
MARCH, 1953 


A LEAST-SQUARES SOLUTION FOR CASE IV OF THE LAW OF 
COMPARATIVE JUDGMENT 


W. A. GriBson 


UNIVERSITY OF NORTH CAROLINA 


Case IV of Thurstone’s Law of Comparative Judgment is displayed 
as a system of homogeneous linear equations for which a least-squares solu- 
tion is presented, using various conditional equations which fix the origin 
~ the unit of measurement. The computational load is, however, quite 
1eavy. 


This paper will present a least-squares solution for Case IV of Thurstone’s 
Law of Comparative Judgment (1, 281-282). The aim is merely to show that 
such a solution exists, rather than seriously to suggest its employment in all 
practical applications of the method of paired comparisons. 

Case IV of the Law of Comparative Judgment is as follows (1, 281): 


S;- & = a a + 0), (1) 
V2 
where 

S; = scale value or average perceived position of stimulus 7 on the 
psychological continuum, 

S, = scale value or average perceived position of stimulus k on the 
psychological continuum, 

o; = discriminal dispersion or standard deviation of the distribution 
of perceived positions of stimulus j7 on the psychological con- 
tinuum, 

o, = discriminal dispersion or standard deviation of the distribution 
of perceived positions of stimulus k on the psychological con- 
tinuum, 


X;, = normal deviate corresponding to the proportion of empirical 
judgment j > k. 


The assumptions involved in this statement of the law are (a) that there exists 
a unidimensional psychological continuum along which the perceptions 
of some attribute of a set of stimuli can be located; (b) that the distribution 
of perceptions of each of the stimuli along that psychological continuum is 
normal in form; (c) that the correlation between the paired perceptions of 
any two stimuli is zero; and (d) that the discriminal dispersions are all of the 
same order of magnitude (1, 273-281). 
15 








16 PSYCHOMETRIKA 


By a consideration of the analytic geometry involved in the plots of 
columns of the square table of X values, Thurstone has worked out an in- 
genious and rapid approximate solution for the S and o values under Case 
IV (2, 293-296). The solution to be presented here will have the advantage 
of providing the best-fitting S and o values for a set of paired-comparison data, 
but it will have the serious limitation of increasingly prohibitive computational 
labor as the number of stimuli becomes large. 

We may write equation (1) explicitly for several of the stimuli as follows: 


b 2% 
Si — 82 = = Xi2(01 + a2 ); 
S, — 8; = v Xi3(0, + os ); 
S; me S, = s Xis(o; + 04), 
Ss, -— § : X os( oo +o ) 
2 3 /2 23 2 3 5) 
S. — S, = vi Kaul 02 + 04), 
S3 = S, = = X ga( 03 -+- 0). (la) 


wi 


Equations (la) constitute a set of six linearly independent homogeneous 
linear equations in eight unknowns. Since the zero point and the unit of 
measurement are arbitrary matters (cf. 1, 281), we may obtain a unique, 
non-trivial solution in the case of four stimuli by specifying the origin and 
the unit of measurement by conditional equations such as the following: 


S, = 0, (2) 
and 
Og = i. (3) 


Substituting equations (2) and (3) into equations (la) and transposing, we 
get the following set of six linear equations in six unknowns: 


. 1 =n 
S, — 8, = /2 X12 01 — Va” G2 ad 0, 
" ] 1 
S; ae S3 ies = X13 Co; _— X13 2 Dias 0, 


V/2 V/2 


1 
S —_-— XxX, 1 u7_— x ’ 
v2°*" Vv2°" 








W. A. GIBSON 17 


ia 1 
S, — S3 a /2 Mes a /2 Xos 03; = 0, 
s ae Losi 
— 4 oC == 9 , 
2 /2 2 2 /2 24 

1 1 

S, — — XxX = —— X,,. (4 

3 /2 34 03 /2 34 ( ) 


Equations (4) may be solved by any of the methods of solution for 7 
independent linear equations in r unknowns. Thus a unique solution, except 
for the origin and the unit of measurement, is possible with four stimuli, 
while an overdetermined solution will be available for more than four stimuli. 
For each stimulus, k, that is added, there will be added (& — 1) new equations, 
while only two new unknowns, S, and o; , are introduced. 

Equations (4) may be stated in matrix form as follows: 


AB = C, (5) 


where the coefficient matrix A, the matrix of unknowns B, and the matrix of 
constant terms C are defined below for the case of four stimuli: 





A B C 
‘<i <—ygp~ ates S 0 
/2 12 /2 12 1 
1 1 
1 —1-—--— =X — — X,; S, 0 
i a V2 
1 1 
1 —~—=—X S. — X. 
/2 14 3 5 14 
1 a 1 me et 01 0 
V2 V2 
1 1 
1 aes Xn o> = 
i 1 
1 —-—X X. 
/2 34 G3 /2 34 























Since A is square when only four stimuli are involved, the unique solution for 
equation (5) in that case is simply 
B= AC. (6) 
For more than four stimuli, a solution for equation (5) can be obtained by 
premultiplying that equation through by (A’A)~‘A’, leaving 
B = (A’A)"A'C. (7) 





18 PSYCHOMETRIKA 


Equation (7) can be shown (cf. 3, 173) to be a least-squares solution 
in the sense that the S’s and o’s it yields are such as to minimize the sum of 
the squared discrepancies of the entries in the matrix product AB from the 
corresponding entries in C. 

A slightly different approach which will have a greater degree of symme- 
try and will involve more of the unknowns in each of the observation equations 
is to replace the conditional equations (2) and (3) by the following ones: 


N 
> 8; = 0, (8) 
7=1 


and 


2 


o, = N. (9) 


j=1 


Equation (9) may be multiplied through by 1/+/2X;, to give 
1 7 1 

——— X; " j= = 

a 


Adding equation (8) and the appropriate equation (10) to each of equations 
(1) and transposing, we get, for five stimuli, 





N Xi. (10) 


25, hs 248s Ti Xia ( Te 7 NX 
28,+ S, a ee ae + Jo Xu( a + ota) = NXn 
28,+ St 8 + 8. + Te Xu es ae 
i a a a aes vi it atte; 9> a N Xs 
S, + 28, + S§,+ 8S; Vi X23 (0; +o, +05) = we N X23 
S,+28,+ S, i vi Kul to, +00) = SEN Xu 


S, — 28, a S; a S, i Xos (a; 


S, + S, + 28; + S;+ 1 a X34 (o, + 02 + 05) = ee 


S, oo S2 = 28; oa S, 


t 
1 

Jf 

1 

/ 

a ) = SN Xn 
1 

/ 

35 (01 + Oe + Gs ve 


va 
S; + S, + Ss + 28,4 + eX 45 (0; + G2 + G3 ) =—-=N X15. 














12 


“(1 





W. A. GIBSON 19 


Each of equations (11) contains 2NV — 3 unknowns, while some of equa- 
tions (4) contain 4 unknowns, others 2. In all, there are 2N unknowns in 
equations (11), while there are 2N — 2 unknowns in equations (4). Thus 
the minimum number of stimuli for a solution by equations (11) is five, in 
‘vhich case there will be ten equations and ten unknowns. 

We may state equations (11) in matrix form as follows: 


EF = G, (12) 


where the three matrices are as shown below for five stimuli: 


























| E a G 
, - Vi Xis or Xie ke s, <AN Yu 
| 
Q 3 Ti ‘. 575 a 555 Xu} | S| a Ni®s 
2 7, Xi Fi i 7 Xu} | S| a N Xi 
| 2 1 vk vk vk Sy | aN X15 
| 1 1 1 ~pXn WP Vi Xu Ss | | cs N Xas 
1 vi Ris, Wi Ti v7 7% [a | | oe N Xn 
1121 Wi a Vi His aA me | |e | | 7 N Xs 
2 Vi a Wi i Tr Pi e | ay N Xu 
BD i ae 
The greater symmetry of these three matrices is apparent. 
For five stimuli the solution of equation (12) is simply 
F = EG, (13) 











20 PSYCHOMETRIKA 


while for more than five stimuli the least-squares solution of equation (12) is 
F = (E’E)'E'G. (14) 


The matrix products E’E and E’G, which are needed in this solution, 
have some interesting properties that will not, however, be discussed further 
here. Suffice it to say that certain sections of those matrix products could 
more easily be formed by the application of simple rules than by actually 
carrying out the matrix multiplications. These rules are readily inferred 
once the matrix multiplications have been carried out symbolically for several 
different values of N. The same kind of regularity holds for the matrix 
products A’A and A’C of equation (7). 

A slight variant of the solution expressed in equation (14) would be to 
divide each of equations (11) through by the corresponding X ;, before forming 
the three matrices involved in the solution. This would have the disadvan- 
tage of complicating the formation of the coefficient matrix, but it would 
have the advantage of giving each of the observation equations equal im- 
portance in the minimizing process. By contrast, consider the effect of 
multiplying, let us say, the first of equations (11) through by 100 before 
getting the least-squares solution for a problem involving more than five 
stimuli. The resulting solution would then be such as to give a very good fit 
for the initial first equation, with relatively little attention having been given 
to the agreement between the right and left members of the other observation 
equations when the S and o values that have been found are substituted into 
them. 

It would of course be possible to obtain a least-squares solution for 
equations (1) by using still other combinations of conditional equations for 
the purpose of fixing the origin and the unit of measurement. For example, 
equations (2) and (9), or (8) and (3), or any other pair of analogous equations 
could be used at the discretion of the investigator. 

In practical problems with the method of paired comparisons it is often 
necessary to eliminate certain of the Jinear observation equations because of 
the instability (when 7;, is greater than .95 or less than .05) or indeterminancy 
(when 7;; is equal to 1 or 0) of the X values involved. Such a reduction of the 
number of rows in E and G would, of course, destroy the complete lexical 
order of the equations and the simple rules of formation of the matrix products. 
One way of getting around this complication would be to work with several 
overlapping sub-sets of stimuli, for each of which all X values are reasonably 
stable, and then to combine the resulting overlapping sets of scale values 
into a single composite set by some graphical fitting procedure. This same 
approach could be used to lighten the computational load even when all X 
values are useable, although it and other such labor-saving reductions would 
work against the primary purpose of a least-squares solution. The limiting 
case of such an elimination would be to retain, on the basis of appropriate 








W. A. GIBSON 21 


reliability considerations, only a ‘“‘best”’ set of 2N equations containing all 
of the unknowns. The matrix # would then be a square matrix, and hence the 
unique solution for equation (12) would be simply equation (13). A still 
shorter solution, if such a complete reduction is contemplated, would be by 
means of equation (6), for a square matrix A is of order two less than a square 
matrix EL, so that equation (6) involves 2N — 2 rather than 2N unknowns, 
2N — 2 rather than 2N observation equations, and the computation of an in- 
verse of order 2N — 2 rather than 2N. These complete elimination pro- 
cedures have little to recommend them, however, for the computational load 
will still be so heavy that it can hardly be regarded as worth while if it does 
not lead to some kind of best fit. 

Of course the main practical disadvantage of the solution described here 
is the computational labor involved in calculating the inverse of a matrix of 
order 2N, 2N — 1 or 2N — 2, even for the smallest possible N. Accordingly, 
this solution is perhaps, for the present at least, more of theoretical than of 
practical interest. 


REFERENCES 


Thurstone, L. L. A law of comparative judgment. Psychol. Rev., 1927, 34, 273-286. 

2. Thurstone, L. L. Stimulus dispersions in the method of constant stimuli. J. exp. 
Psychol., 1931, 15, 284-297. 

3. Turnbull, H. W., and Aitken, A. C. An introduction to the theory of canonical 

matrices. London: Blackie and Son, Limited, 1932. 


_ 
. 


Manuscript received 2/18/52 


Revised manuscript received 7/10/52 




















PSYCHOMETRIKA—VOL. 18, No. 1 
MARCH, 1953 


AN ANALYTICAL SOLUTION FOR APPROXIMATING 
SIMPLE STRUCTURE IN FACTOR ANALYSIS 


JoHN B. CARROLL 
HARVARD UNIVERSITY 


It is proposed that a satisfactory criterion for an approximation to 
simple structure is the minimization of the sums of cross-products (across 
factors) of squares of factor loadings. This criterion is completely analytical 
and yields a unique solution; it requires no plotting, nor any decisions as to the 
clustering of variables into ‘subgroups. The equations involved appear to be 

capable only of iterative solution; for more than three or four factors the 
computations become extremely laborious but may be feasible for high-speed 
electronic equipment. Either orthogonal or oblique solutions may be achieved. 
For illustrations, the Johnson-Reynolds study of “flow” and “selection” 
factors and the Thurstone box problem are reanalyzed. The presence of 
factorially complex tests produces a type of hyperplanar fit which the investi- 
gator may desire to adjust by graphical rotations; the smaller the number of 
such tests, the closer the criterion comes to approximating simple structure. 


A criticism of current practice in multiple-factor analysis is that the 
transformation of the initial factor matrix F to a rotated, ‘‘simple structure”’ 
matrix V must apparently be accomplished by methods which allow con- 
siderable scope for subjective judgment. There has been much discussion 
of whether anything approaching a unique solution can be achieved under 
these conditions; the customary answer is that highly similar solutions can 
be reached by two analysts working independently, provided that they 
follow the same set of principles. This, of course, is not an entirely satis- 
factory answer. Graphical rotation unfortunately partakes more of art than 
of science. The few efforts to reduce subjectivity in rotation to simple structure 
have not been completely successful. Horst’s methods (1) depend upon 
subjective decisions regarding the sub-grouping of variables, and Tucker's 
semi-analytical method (4) involves the use of graphical methods as an 
aid in selecting such sub-groups. 

It is the purpose of this paper to present a method for approximating 
_simple structure which completely avoids subjective decisions. The possibility 
of alternative criteria must be admitted, but the present method appears 
to lead rather closely to the type of solution which is desired in multiple- 
factor analysis. Most important, it yields a unique solution. The reason for 
emphasizing that the method yields only an approximation to simple structure 


will be discussed later. 


23 








24 PSYCHOMETRIKA 


Thurstone (3, 335) lists the following desirable characteristics of a 
simple structure matrix: 


(1) Each row of the oblique factor matrix V should have at least one 
zero. 

(2) For each column p of the factor matrix V there should be a distinct 
set of r linearly independent tests whose factor loadings v;, are zero. 

(3) For every pair of columns of V there should be several tests whose 
entries v;, vanish in one column but not in the other. 

(4) For every pair of columns of V, a large proportion of the tests 
should have zero entries in both columns. This applies to factor problems 
with four or five or more common factors. 

(5) For every pair of columns there should preferably be only a small 
number of tests with non-vanishing entries in both columns. 


It is obvious that there could hardly be any single mathematical expression 
which would embody all these characteristics. If we consider characteristics 
(3), (4), and (5), however, it would seem that some sort of inner-product 
function of the columns of V should be at a minimum if simple structure is 
to be attained. It might occur to one, for example, that the sum of the non- 
diagonal entries in the matrix product V’V should be as close to zero as 
possible. This solution must be ruled out because under orthogonal trans- 
formation of V a zero sum can be achieved merely when positive and negative 
cross-products balance. If we form, from V, a new matrix T containing the 
squares of the entries in V, we can minimize the sum of the non-diagonal 
entries in T’T and obtain an approximation to simple structure. Such a 
criterion appears to satisfy rather well (within the limitations of a single 
analytical expression) the characteristics of simple structure listed by Thur- 
stone. The similarity to least-squares criteria used in other branches of 
statistics will be evident. 

The following notation will be the basis of the subsequent development: 


j = 1,2, --- , n (tests); 
m, k = 1, 2, --- , s (arbitrary reference factors) ; 
p,q = 1,2, --- , t (rotated factors); s = ¢; 
F = || a;,, || = the initial matrix of projections of n test vectors on s 
arbitrary reference factors; 
A = ||), || = the matrix of direction cosines of reference vectors of 
rotated factors; 
V = || »;,|| = the matrix of projections of test vectors on rotated 
reference vectors; 
T = |! v;,|| = || 0;,° || = matrix of squares of entries in V; 
Q= TT =) w,, || = asquare symmetric matrix of inner-products of 


columns of T. 

















JOHN B. CARROLL 25 


It should be noted that F can be any initial orthogonal matrix; it may have 
been obtained by the centroid method, the multiple-group method, or any 
other method producing a matrix such that FF’ yields the correlation matrix 
plus a matrix of residuals. 

The proposed criterion for approximating simple structure is that the 
sum of the non-diagonal elements of 2 be a minimum. That is, if we use 
only the entries on one side of the diagonal, 


f = De, = a minimum. (1) 
p<@ 


Now, since 
FA = J, (2) 
it is evident that 


Vip = Vin = ( Zz sicdiie) . (3) 
and equation (1) becomes 


f a D agi pe = UipVig 


p<@ p<@ i=l 


2 


=r ( peel hs diehne) (4) 


Multiplying out and appropriately collecting terms, we find that any w,, 
(which is, of course, a scalar) can be conveniently expressed by the equation 


i. a A (5) 
where 7? is a row vector with s(s + 1)/2 entries, that is, 
M; = || Mp ’ Nop vet Le Nop ’ AipAzp eas. AipAsp ’ NopAsp ’ 
hee NopAsp ita he he X¢s-1)pAsp ils 


M, is a column vector constructed in the same manner as M, but with a 
change of subscript; and A is a square symmetric matrix of order s(s + 1)/2 
which may be specified as equal to Z’=, where Z is ann X s(s + 1)/2 matrix, 
the first s columns of which may be represented as || a3, || , and the remaining 
s(s — 1)/2 columns of which may be represented as || (2a;,,a;,) || , k > m. 
The order of the entries, with respect to combinations of the subscripts m 


and k, matches that in M, . Thus, 


== || a. (2a;m@;x) ||. 











26 PSYCHOMETRIKA 


For example, if s = ¢ = 3, 


2 2 2 | 
Qy1 Qyi2 Qi3 2011012 2411013 2012013 | 

; | 

=F 2 2 2 | 
= Nae, a; Q53 2a ;;a;2 2a; ;;3 24 j20;3 I 
H 

+ @ 

| 

> > 9 | 

An Qn2 Ans 201A n2 20:1An3 20,20n3 || 


and =’= becomes 
da Daina, Lanajs 2Vanay W%Vanays 2 ajajsa;s 

> Atl : es Pin, C208 SP bntitan de Aa 
Lajas; Laisai, Liais = @QYanaais2Danais 2 Daiais 
Reeite Beets Boentitnté ree 42.0 ertn 42 Ae 
Sees Be enente ene 420i tee. 1D Ritts 
Seen t eet, 22.40. 42 6hbOn 42 6:40 42 en 


where all summations are over j. 
We can generalize equation (5) and write 


Q = M’AM, (6) 


where M is a matrix of order [s(s + 1)/2] X t, the successive columns of 
which are constituted by the column vectors previously represented as M, . 

It would be natural, at this point, to attempt to solve for the desired 
transformation matrix A which would minimize the criterion f. This would 
be done by methods of the calculus. The partial derivative of equation (4) 
for each X,,, would be determined and set equal to zero, under the restriction 
that >>5_, A, = 1. The additional restriction could be made, if desired, 
that A’A = I in order to impose orthogonality upon the solution. The equa- 
tions which would result from this process have been found to be complex, 
even for the simple case of two factors, and do not appear readily soluble by 
any known mathematical methods. 

It therefore seems advisable to minimize the criterion by iterative 
methods, that is, by systematically varying values of X,,, until the function 
f becomes stationary at its minimal value. As will be shown below, this is 
perfectly feasible by ordinary computational techniques when the number 
of factors does not exceed three or four, or perhaps five. For a larger number 
of factors, high-speed electronic computing equipment may enable a solution, 
but it should be remembered that the matrices which are involved are of 
order s(s + 1)/2. Thus, for ten factors, the criterion is a sum of scalar products 

















JOHN B. CARROLL 27 


of matrices which have as many as 55 rows or columns, or both. Furthermore, 
the number of iterations or trials which may be required will increase tre- 
mendously. The laboriousness of the preparatory calculations to determine 
the matrix A will depend also on the number of tests, although the number 
of tests does not affect the principal part of the procedure, i.e., the iterations. 

Let us suppose that we are interested in systematically varying the values 
for one column of A, e.g., A, , while the remaining columns stay constant. 
(This implies that we are allowing an oblique structure; if orthogonality is 
to be maintained, we should have to vary systematically two columns of A 
while the remaining columns stay constant.) The one column which is being 
varied is designated x, while the remaining columns are designated r. Under 
these conditions the value of f changes only as a function of A, , and since 
M, is a function of A, we may write the function to be minimized as 


j.=- Xo. = MLAl = M.); (7) 
r=1 


r=1 


and defining 


t—1 
M,= > M.,, 
r=] 
we write equation (7) as 
f. = MIAM, . (8) 


We will then be in a position to make iterations by using trial values 
of A, , determining M! , and postmultiplying by the column vector AM, . 
AM, will, of course, remain constant during the trials for minimizing f, . 
Having found this minimum, we can choose another column of A for A, , 
recompute AM, using the previous results, and iterate to find a minimal 
value of the new function f, . If we proceed in this way until the function 
has become stationary for every column of A, the solution is eventually 
found. Experience seems to show that the solution converges quite rapidly. 
The final solution will give two sets of values for each column of A, differing 
only in sign. On the general assumption of a positive manifold we shall 
take the value of A which gives a positive sum of values in the corresponding 
column of V. The direction in which each vector is to be taken can be easily 
determined by premultiplying each column vector of A by a row vector 
consisting of the columnar sums of entries in F; the column of A must be 
reversed in sign if this product is negative. 


Numerical Examples 


As an example of the use of the present criterion with a study involving 
two factors, we take the data analyzed by Johnson and Reynolds (2). Table 1 
shows the original centroid matrix F, the preparatory calculations leading 
to the matrix A, the iterated solutions for M and hence for A, and the resulting 








28 PSYCHOMETRIKA 


factor matrices for both the orthogonal and the oblique case. (Since the actual 
procedure for iteration can be more usefully illustrated for the three-factor 
case, this procedure is not shown for the Johnson-Reynolds data.) Table 1 
also shows the matrix 2; the non-diagonal entry of 2 stands at .122 for the 
orthogonal case and at .041 for the oblique case. Figure 1 shows a plot of 


TABLE 1 


Orthogonal and Oblique Analytical Solutions for the Data of 
Johnson and Reynolds 























Orthogonal Oblique 
Solution Solution 
F = Fa = V FA =V 
I II a;;? Q;11? 24514511 A B A B 
1 .803 —.476 .092 .227 —.288 1 .187 .533 1 —.232 .538 
2 .361 —.355 .180 .126 —.256 2 21 .428 2 —.099 . 486 
3 .880 —.173 .144 .030 —.131 3 .331 . 255 3 .064 . 265 
4 .697 —.080 .486 .006 —.112 4 .661 . 237 4 .315 . 256 
5 .810 232 .656 .054 .376 5 .842 —.041 5 .688 —.017 
6 .801 .202 .642 .041 .324 6 .826 —.014 6 .608 .010 
7 .743 .265 .552 .070 .394 7 .784 —.089 “f .629 —.066 
8 .619 .281 .383 .079 .348 8 .667 —.133 8 .5674 —.113 
9 .775 .183 .601 .033 . 284 9 .797 —.002 9 .578 .021 
10 .758 —.161 .575 .026 —.244 10 . 702 .330 §=610 .281 .3850 
A A 
A B A B 
I .974 228 I .548 . 256 


II .228 —.974 II .836 —.967 



































se=A M M 
I II LII A B A B 
I 2.268  .210 .703 I 948 .052 I 301 —.066 
II 210 .086 —.011 II .052 .948 II .699  .984 
Ill .702 —.011 841 Ll] .222 —.222 III .459 —.247 
M’AM =2 M’'AM =2 
A B A B 





A 2.896 122 A . 700 .041 
B .122 . 134 







































JOHN B. CARROLL 29 











Figure 1 


Comparison of Two Orthogonal Solutions for the Data of Johnson and Reynolds (2). 
(The broken lines represent Johnson and Reynolds’ own solution; the solid lines represent 
the analytical solution.) 











30 PSYCHOMETRIKA 


the factor loadings of the tests on the original arbitrary reference axes; it 
also shows the rotated planes produced by (a) Johnson and Reynolds’ own 
(orthogonal) solution, and (b) an orthogonal solution produced by the 
present criterion. Figure 2 similarly shows (a) the oblique solution which 
the writer would be inclined to make by the method of graphical inspection, 
and (b) the oblique solution produced by the present criterion. 

As a numerical example of a three-factor case, we choose the data of 
Thurstone’s well-known ‘‘box problem” (3, 136). Table 2 presents the original 
matrix F and an oblique solution achieved by the method of the present 
paper. Figure 3 depicts the position of the three hyperplanes arrived at by 
the present solution and compares them with those computed by Thurstone. 

Table 3 has been prepared to show in skeleton form a part of the iterations 
which were made to achieve the solution presented in Table 2. One complete 
cycle of iterations is shown, together with an outline of the subsequent cycles. 
It will be noted that the solution achieved stability by the time of the fourth 
cycle of iteration. Iteration was pursued until no change of .01 in any value 
of \3,, produced any reduction as great as .001 in values of f, . 


Discussion 


The criterion proposed here must be regarded as yielding only an approxi- 
mation to simple structure precisely for the reason that the solution will 
be to some extent different from that yielded by the best graphical methods, 
and will in general fail to satisfy as well as possible the desiderata of simple 
structure. It is the writer’s opinion that graphical methods, properly used, 
still afford the most desirable way of obtaining a final simple structure. 
By graphical methods, supplemented by such techniques as Tucker’s (4), 
one can fit hyperplanes in a highly acceptable manner, and without certain 
of the disadvantages of solutions reached by the present analytical method. 

Let us consider, however, the advantages of the present method. 

(1) It is a unique solution attainable by solely analytical techniques and 
hence requires no subjective judgment at any point. Furthermore, it requires 
no factor plots. 

(2) It gives a delineation of the approximate location of the best-fitting 
hyperplanes in the factor space. It is believed that an investigator would 
have to make only small additional rotations to bring the solution to the 
form which would satisfy graphical criteria. 

(3) It takes account of all the data in a given study. If there are fac- 
torially complex tests in a study, the criterion utilizes these indiscriminately 
in defining hyperplanes. For example, in Figure 2, the factorially complex 
tests 4 and 10 influence the positioning of plane A (and also of plane B). 
Likewise, the many factorially complex ‘‘tests’’ in Thurstone’s box problem 
influence the positioning of the hyperplanes, as shown in Figure 3. At the 
same time, the more tests there are to define a hyperplane, and the closer 








JOHN B. CARROLL 31 





5ST 
8 
® eo es 
ig urls -" 686 
% tat d 9 
‘ hae 
\ Oo 1 rl n 
——+—— = +} + ++ - T 
= \ at 
: \ 
| an e3 *%10 
\ &, 
: \ ~~ 
\ 
\ e@2 < 
7 \2 
~.5+ + > 
\Z 
7 xX 
x 
| \ 
\ 
ii ie 





[I 
FIGURE 2 


Comparison of Two Oblique Solutions for the Data of Johnson and Reynolds (2). (The 
broken lines represent an inspectional graphical solution; the solid lines represent the 
analytical solution.) 





32 PSYCHOMETRIKA 


TABLE 2 
An Oblique Analytical Solution for Thurstone’s Box Problem 











F = es 
I II III i dt ft 2 Tt io A B C 
1 .659 —.736 .138) 1 .434 .542 .019 —.485 .091 —.102} 1 —.132 .900 —.094 
2 .725 .180 —.656) 2 .526 .032 .430 .130 —.476 —.118) 2 .865 —.169 —.153 
3 .665 .537 500) 3 .442 .288 .250 .357 =.882—S .268) 3 —.1138 —.105 .902 
4 .869 —.209 —.443) 4 .755 .044 .196 —.182 —.385 .093) 4 .606 .293 —.164 
5 .834 .182 .508] 5 .696 .033 .258 .152 .424 092) 5 —.167 .264 .763 
6 .8386 519 .152) 6 .699 .269 023.434.127.079} 6 = .251 —.158 —.686 
7 .856 —.452 — 269) 7 .733 .204 .072 —.387 —.230 .122) 7 .877 .566 —.175 
8 .848 —.426 320) 8 .719 .181 .102 —.361 .271 —.136) 8 —.158 .744 282 
9 .861 .416 —.299) 9 .741 .173 .089 .858 —.257 —.124,9 643 —.215 .296 


10 .880 —.341 —.354/10 .774 .116 .125 —.300 —.311 .121/10 .492 .445 —.168 
11 .889 —.147 436/11 .790 .021 .190 —.131 .3888 —.064/11 —.175 548 541 
12 .875 ~—.485 — .093|12 .766 .235 .009 .424 —.081 —.045|12 478 —.201 .495 
13 .667 —.725 109/13 .445 .526 .012 —.484 .073 —.079/13 —.100 .882 —.107 
14.717 = .246 —.619/14 .514 .061 .883 .176 —.444 —.152|14  .847 —.218 —.090 
15 .634 ~—-«.501 522)15 .402 .251 .272 318 3381 .262)15 —.152 —.075 888 
16 .936 .257 ~=—-.165/16 .876 .066 .027 .240 .154 .042)16 .197 .109  .581 
17.966 —.239 —.083)17 .933 .057 .007 —.231 —.080 020/17 .297 .474 .123 
18 .625 —.720 -166/18 391 .518 .028 —.450  .104 —.120)18 —.164 .885 —.075 
19 .702  .112 — -650)19 493 .013 422 079 —.456 — 073/19 834 —.114 —.194 
20 .664 .536 .488/20 .441 .287 .238 .356 .324 -262/20 —.102 —.108 .892 


| 


| ee re “a 











I II III Lil <3 Gt A B C 





I 8.469 2.131 1.800 .032 —.350 .380 I 297 .3800 =. 332 
II 2.131 1.328 .377 —.707 .625 ~~ «.117 II .274 —.889 .566 
III 1.800 .377 .886 .633 —.497 .172 ITI —.915 .3846 =. 755 
































Ill .032 —.707 .63838 8.529 .757 1.254 
IIT —.350 .625 —.497 .757 7.205 1.265 
II, III .880 .117 = =.172 1.254 1.265 1.510 
M’ M’AM = 
I II Ill Li fi ial A B C 
A .088 .075 .8387 .081 —.272 —.251 A 2.009 .3807 =. 267 
B .090 .790 .120 —.267 .104 —.308 B 807 «2.485 =. 265 
C 110 .820 .570 = .188 ~~ .251 .427 y . 267 .265 2.756 
f = Lie, = .839t 
p<aq 


*Our A corresponds to Thurstone’s B; our B corresponds to Thurstone’s A; our C corresponds to 


Thurstone's C. 
tIf we use the transformation matrix A provided by Thurstone, f = 1.409. 











JOHN B. CARROLL 33 








—---— (I) Thurstone's Solution 
—— (2) Analytical Solution ‘ 





-1.0 





’ 
-It 
FIGURE 3 
Comparison of Two Oblique Solutions for Thurstone’s Box Problem (3, 136, 228). (Extended 
Vector Representation on Reference Axes II and III.) 


they are together, the more acceptably the hyperplane will be defined. 
For example, in the Johnson-Reynolds data shown in Figures 1 and 2, the 
relatively large concentration of tests 5, 6, 7, 8, and 9 in the configuration 
makes for a good definition of plane B—close to where it would be put by 
graphical inspection. This is true both for orthogonal and oblique solutions. 
These considerations lead to the conclusion that the present criterion will 
probably work best for well-designed factor studies where there are a large 
number of factorially pure tests and a relatively small number of factorially 
complex tests. 

The disadvantages of the present method seem to be as follows: 

(1) The presence of factorially complex tests makes the primary axes 
more highly correlated than they would be if placed by graphical methods, 
and may give rise to larger negative projections than would otherwise be 








34 


PSYCHOMETRIKA 


TABLE 3 


The Iterations (Skeletonized) for Analytical Solution of the Thurstone Box Problem 








Line 


M,’AMp A,’ 


I aH 1 Oo Ati =f, I 6 ime Ee 








OCNAOoh WH eH 


(Mp)’ 0 1 1 
(AMpR)’ |3.931 1.705 1.263 — 
M,' 0 0 1 





A 


* * * 


* 
89 * * * * 
1.263 0 0 1 
1.365 174 O -985 
0 
—.1 


| 


“I 
He 
_ 
tn 
Qo 


.030 0 .970 
.030 0 .970 
| 0 .030 .970 
| 0 3 3 
| 0 


1.321 —.174 985 
0 —.171] 1.226 0 74 =.985 
=: 0 3 3 e 4 

085 .914 0 —.279| 1.219** 0 -—.292 .956 
| 010 .085 .904 —.029 .095 —.279} 1.260 100 —.292 .954 
| .010 .085 .904 029 —.095 —.279) 1.231 —.100 —.292 .954 
| 010 .075 .914 —.028 096 —.263) 1.259 .100 —.275 .956 
| .010 .075 .914 028 —.096 —.263) 1.231 —.100 —.275 .956 





ee6 so 6 S:6 
+- 
a) 
at 
-eOosono 











4&5 


“I 


9 & 10 


11 & 12 


Iteration is started by arbitrarily choosing rotated vector A. The other vectors are 
initially taken collinear with reference axes II and III. Line 1 is hence the sum of 
the entries in Mg’ and Mo’, which have only unity in columns II and III, re- 
spectively. The asterisks in the columns for f, and A,’ show that these columns do 
not apply to the initial computation of (Mp)’ and (AM,)’. Asterisks will similarly 
appear in the first two rows of computations for each new trial vector, e.g., in 
lines 13-14, 24-25, etc. For convenience the computation sheet shows all column 
vectors in transposed form. 

(AMp)’ is computed using the matrix A in Table 2. Summational checks should 
be computed, but are not shown here. 

Although it might appear that our first trial vector for A has been collinear with 
reference vector I, we find that the minimum value of f, can be found by taking 
it collinear with reference vector III, since by so doing we find fy = M,’AMp = 
1.263, as shown in the column labeled f, . If we had taken vector A collinear with 
reference vector I, f, would have been equal to 3.931. The columns labeled A,’ 
have been filled out; 7,’ is always a function of A,’. 

We shall now start iteration by taking a small move using vectors I and III, using 
both a positive and a negative value of Az, . .030 is arbitrarily placed in M,’ in 
column I; the remaining entries of M,’ are determined from this. The sum of the 
first three entries of M,’ must equal unity; the entries in A,’ are square roots of 
these, and the last three columns of M,’ are cross-products of entries in A,’. 
Finding each value of f, , we see (in the column f,) that neither move has the 
effect of reducing f, from the value in line 3. 

We now try a similar move, but using reference vectors II and III. It immediately 
appears that f, decreases if we use some negative value of Aqza - 

The wavy lines (2) are used throughout the table to symbolize the arbitrary 
variation of entries until the value of f, becomes stationary. In the present case, the 
following values of d2;;, were tried: .117, .250, .106, .128, .095, .067, and .085. A 
graph of f, as a function of X%;;4 aided in finding the minimal value, which was 
determined to be approximately as shown in line 8. A single asterisk was placed in 
line 8 to show that a temporary minimum had been attained. 

An attempt was made to reduce f, by fixing \?;;, and varying \*;, and dz;74 , but 
to no avail. Both positive and negative square roots were employed. 

A similar attempt was made using various values of A’, and A*;77,4 , but without 
any reduction in f, . The vector in line 8 was therefore marked as the final one for 
this trial, and another asterisk in line 8 was placed to signify this fact. 








JOHN B. CARROLL 35 


TABLE 3 (Continued) 



































M,'AMp A,’ 
Line I IW UW LU tut Wit] =f, I I Im 
13|(Mp) | 0 .0851.914 0 ee Fe ae 
14 | (AMp)’ |3.520 .802 1.680 .801 —1.251 —.082) * | * * * 
15 | Mp’ . es 0 0 0 | .802 0 1 0 
16 © 37 me) (CO 0 171| | .814 0 .985 .174 
17 0 .970 .030 0 0 —.171| .842 0 985 —.174 
18 030 .970 0 at 0 | 1.020 174 985 0 
19 030 .970 0 -—.171 0 0 | .746 |-.174 .985 0 
20 ; ¢ *% 0 0 3 = 3 0 
21 024 .976 0 —.154 0 0 | .744** |—.156 988 0 
22 014 976 010 —.117 -.012 099} .762 |-.118 .988 .100 
23| y 014 976 010 —.117 +.012 ~—.099} .748  |—.118 .988 —.100 
241(Mp)’ | .0241.061 914 —.154 0 —.279 * | * * * 
25 | (AMp)’ |3.998 1.881 1.108 —1.834 —.269 —.324 * | * * * 
26 | Mc’ . & 2 0 0 0 1/1108 | O 0 1 
27 “2 Pa a ee 
28 0 .036 .964 0 0 188} 1.075 | 0 191 982 
29 : + wee 2 3 3 3 | 3 982 
30 009 ..027 .964 .016 .093  .161| 1.048** | .095 .164 .982 
31 > 027 3 } : : ; : se 8 
Line 
13 We proceed to iterate trial vector B; therefore Mp’ contains sums of entries in the 


M,’ just computed (line 8) and Mo’, which, as pointed out for line 1, corresponded 
to a vector collinear with reference vector ITI. 

14 (AMp)’ is computed using M;’ in line 13 and the matrix A of Table 2. 

15 We choose an initial vector collinear with reference vector II because this yields 
the smallest value of f, . 

16-17 An unsuccessful move using vectors II and III. 

18-21 A move using vectors I and II; a negative value of \yg reduced f, , so iteration 
proceeded in this direction (line 20) until a stationary value of f, was achieved. 
Again, a graph of the function aided in locating the minimizing values. 

22-23 An unsuccessful move using vectors I and III. Another trial with IT and III was 
taken, unsuccessfully, and is not shown here. The vector represented in line 21 was 
chosen as final for this trial. 

24 Since we now proceed with vector C, Mp’ is the sum of corresponding entries in 
M,/ and My’, lines 8 and 21. 
26 The initial vector for C is chosen collinear with III because it gives the smallest f, . 

27-28 Iteration varying II and III, reducing f, to 1.075. 

29-30 Iteration varying I and II, reducing f, to 1.048. 

31 Unsuccessful iteration varying I and III, without reduction of f, . The vector in 
line 30 is chosen as final. In general, iteration proceeds until all possible combina- 
tions of vectors have been tried at least once. 















































PSYCHOMETRIKA 


TABLE 3 (Continued) 


























M,'AMp A,’ 
Line I 0 Wii Lut immi| =f I in eet 
| 
32 | (Mp)’ | .033 1.003 .964 —.138 .093  .161 * ° . - 
33 | (AMp)’ |4.176 1.940 1.186 —1.003  .905  .483 * * + . 
34 | M,’ 0 08 914 0 0 —.279] 1.114 0 —.292 .956 
35 3 085 3 3 3 3 3 3 —,292 
36 030 .085 .885  .051 —.163 —.274| 1.009 —.173 —.292 .940 
37 =: 3 3 3 3 3 
38 030 .120 .850  .060 —.160 —.319] 1.007** |—.173 —.346 .922 
39 | (Mp)’ | .039 1471814 .076 —.067 —.158 * + . . 
40 | (AMz)’ |3.875 .848 1.787 1.445 —1.448 116 * * * * 
41 | Mp’ 024 .976 0 -—.154 0 0 | .698 —.156 .988 0 
42 | se aa 3 3 3 3 3 3 3 
43 065 .885 .050 —.240 .057 —.211| .638** |—.255 .941 —.224 
44|(Mp)’ | .095 1.005 .900 —.180 —.103 —.530 * * * . 
45 | (AMp)’ |4.395 1.877 1.193 —2.416 —1.401 —.848 * * * * 
46| Mc’ | .009 .027 .964 .016 .093 .161| .985 095 .164 .982 
47 i = Sac 3 3 3 3 3 3 
48 | .110 .310 .580 .185  .253 .424) .596** 382.557 .762 
































49 |(Mp)’ | .1751.195 630 -—.055 .310 213, * a ae 
50 | (AMp)’ |5.133 2.455 1.171 —.408 2.8384  .959 * . 4 . 
51|M,’ | .030 .120 .850 .060 —.160 —.319] .660 —.173 —.346 .922 
52 i a ee Se 3 3 3 

53 | 088 .075 .837 .081 —.272 —.251] .571** |—.297 —.274 .915 
54} (Mp)’ | .198 .3851.417 .266 —.019 173 + . 4 ° 
55 | (AMp)’ |5.129 1.288 1.965 3.102 —.250 .985 . - “4 ss 
56 | Mp’ | .065 .885 .050 —.240 .057 —.211| .616 —.255 .941 —.224 
Ges BAe a a 4 
58 | 090 .790 .120 —.267 .104 -—.308| .573** |—.300 .889 —.346 
59 |(Mp)’ | .178 .865 .957 —.186 -—.168 —.559 * . . vs 
60 | (AMp)’ |4.914 1.850 1.364 —2.145 —2.056 —.956 . . ‘4 ‘f 
61| Mc’ | .110 .310 .580 .185 .253 424) .533 332.557 = .762 
62 . 2) é 3 H H H 3 

63 | .110 .320 .570 .188 .251 427] .532*** | .332 .566 .755 
Line 

32 (Mp)’ is the sum of corresponding entries in lines 21 and 30. 


34 Since we now have better information as to the location of vector A, we use the 
information in line 8 (the result of previous iteration with A) as a starting point. 
Note that iteration with B and C (in the previous trials) has had the effect of 
reducing f, from 1.219 (line 8) to 1.114 (line 34). 
35-38 Various iterations until the final vector is arrived at, in line 38. 
39 (Mp)’ is the sum of corresponding entries in lines 30 and 38. 
41 The previous vector (line 21) is used as a starting point. 
44-48 Another round of iterations with vector C. Note the great drop in f, from line 30 
to line 48, and likewise, the great shift in Ag’ during this trial. 
49-63 The third cycle of iterations for vectors A, B, and C. The changes in f, are becoming 
smaller. 








JOHN B. CARROLL 37 


TABLE 3 (Concluded) 
































M,'AMpR bet 
Line I DW fF bi Pitt =e = f, I II III 
64 | (Mp)’ .200 1.110 .690 —.079 .305 119 26 . ’ =: 
65 | (AMp)’ |5.220 2.452 1.184 —.597 2.929 .854 i ‘6 * - 
66 | M,’ .088 .075 .837 .081 —.272 —.251| .575 —.297 —.274 .915 
67 | te 3 3 3 : 3 3 
68 .088 .075 .837 081 —.272 —.251| .575*** |—.297 —.274 .915 
69 | (Mp)’ .198 .395 1.407 .269 —.021 .176 > 5 . . 
70 | (AMp)’ |5.184 1.294 1.9638 3.117 —.247 .940 . = : - 
71 | Mp’ .090 .790 .120 —.267 104 —.308) .572 —.300 .889 —.346 
i. ap . £4 
73 .090 .790 .120 —.267 104. —.308} .572*** |—.300 .889 —.346 
Line 


64-68 A fourth round of iterations with vector A; though all possible directions of moves 
were tried, none resulted in any reduction of .001 in f, . 

69-73 A fourth round of iterations with vector B; no shifts resulted in any reduction as 
great as .001 in f, . Since vectors A and B had not moved from their positions in the 
third cycle, it was unnecessary to make further iterations with C (unless a higher 
degree of accuracy had been wanted). The computations were therefore complete, 
and the final vectors in lines 68, 68, and 73 were regarded as constituting the 
analytical solution. 


the case. One would like a criterion which would place the planes pretty 
much as they would be placed by graphical methods, particularly where 
the structure is as clear as it is in the Thurstone box problem. A possible way 
out of this difficulty would be (a) to carry out the present solution, (b) use 
some established rule to determine which are the factorially complex tests, 
(c) eliminate the factorially complex tests from the matrix F, and (d) make 
a new solution. As an arbitrary rule for selecting complex tests, it may be 
suggested that a complex test be defined as one having entries greater than 
.25 in two or more columns of the matrix V. 

(2) The method requires a large amount of computational labor. This 
should not present a real obstacle, however, if the method can be adapted 
for solution by high-speed computing devices. It is also possible that a non- 
iterative solution may ultimately be found, 








38 PSYCHOMETRIKA 


REFERENCES 


1. Horst, Paul. A non-graphical method for transforming an arbitrary factor matrix into 
a simple structure matrix. Psychometrika, 1941, 6, 79-99. 

2. Johnson, D. M., and Reynolds, F. A factor analysis of verbal ability. Psychol. Rec., 
1941, 4, 183-195. 

3. Thurstone, L. L. Multiple-factor analysis. Chicago: Univ. Chicago Press, 1947. 

4. Tucker, L. R. A semi-analytical method of factorial rotation to simple structure. 


Psychometrika, 1944, 9, 43-68. 


Manuscript received 8/1/52 








PSYCHOMETRIKA—VOL. 18, No. 1 
MARCH, 1953 


A NEW STATUS INDEX DERIVED FROM SOCIOMETRIC 
ANALYSIS* 


Leo Katz 


MICHIGAN STATE COLLEGE 


For the purpose of evaluating status in a manner free from the defi- 
ciencies of popularity contest procedures, this paper presents a new method 
of computation which takes into account who chooses as well as how many 
choose. It is necessary to introduce, in this connection, the concept of attenua- 
tion in influence transmitted through intermediaries. 


Introduction 


For a considerable time, most serious investigators of inter-personal 
and inter-group relations have been dissatisfied with the ordinary indices 
of “status,” of the popularity contest type. In the sociometric field, for 
example, Jennings (1) says, “‘... it cannot be premised from the present 
research that greater desirability per se attaches to a high [conventional 
computation] choice-status as contrasted with a low choice-status in any 
sociogroup without reference to its milieu and functioning.’’ However, in 
the absence of better methods for determining status, only two alternatives 
have been open to the investigator. He has been forced either to accept the 
popularity index as valid, at least to first approximation, or to make near- 
anthropological study of a social group in order to pick out the real leaders, 
i.e., the individuals of genuinely high status. 

The purpose of this paper is to suggest a new method of computing 
status, taking into account not only the number of direct ‘‘votes’’ received 
by each individual but, also, the status of each individual who chooses 
the first, the status of each who chooses these in turn, etc. Thus, the proposed 
new index allows for who chooses as well as how many choose. 

For the present discussion, an operational definition of status is assumed, 
status being defined by the question asked of the members of the group. The 
same device, then, may be used to study influence, transmission of informa- 
tion, etc. 

The New Status Index 


To exhibit the results of the “balloting,” we shall use the matrix repre- 
sentation for sociometric data as given by Forsyth and Katz (2). An example 
for a group of six persons appears below. In this example, A chooses only 

*This work was done under the sponsorship of the Office of Naval Research. 

39 











40 PSYCHOMETRIKA 


F, B chooses C and F, C chooses B, D, and F, and so on. The principal diagonal 
elements, by convention, are zeroes. The question asked could be, ‘‘Which 


9)) 


people in this group really know what is going on? 











Chosen 
Chooser —— 
A B C D E F 
A 0 0 0 0 0 1 
B 0 0 1 0 0 1 
C 0 1 0 1 0 1 
D 1 0 0 0 1 0 
E 0 0 0 1 0 1 
F 1 0 0 1 0 0 
Totals 2 1 1 3 1 4 


In the Forsyth and Katz formulation, the 6 X 6 array above is referred 
to as the choice matrix, C, with element c;; = response of individual 7 to 
individual j. Further, as pointed out by Festinger (3) for matrices whose 
elements are 0 or 1, powers of C have as elements the numbers of chains of 
corresponding lengths going from 7 through intermediaries to 7. Thus, 
C* = (c{?), where c{? = >>, c:, cj; ; each component, c¢;, ¢; , of c{? is equal 
to one if and only if 7 chooses k and k chooses /, i.e., there is a chain of length 
two from i to 7. Higher powers of C have similar interpretations. 

The column sums of C give the numbers of direct choices* made by 
members of the group to the individual corresponding to each column. Also, 
the column sums of C’ give the numbers of two-step choices from the group 
to individuals; column sums of C*, numbers of three-step choices, etc. An 
index of the type we seek, then, may be constructed by adding to the direct 
choices all of the two-step, three-step, etc., choices, using appropriate weights 
to allow for the lower effectiveness of longer chains. In order to construct 
appropriate weights, we introduce the concept of “attenuation” in a link of 
a chain. 

It is necessary to make some assumptions regarding the effective func- 
tioning of an existing link. The first assumption we make is common to 
all sociometric work, namely, that our information is accurate and that, 
hence, certain links between individuals exist; and where our information 
indicates no link, there is no communication, influence, or whatever else 
we measure. Secondly, we assume that each link independently has the same 
probability of being effective. This assumption, obviously, is no more true 
than is the previous one; however, it seems to be at least a reasonable first 
approximation to the true situation. Thus, we conceive a constant a, de- 
pending on the group and the context of the particular investigation, which 


*In the sequel, it is assumed that C is a matrix of 0’s and 1’s. 























LEO KATZ 41 


has the force of a probability of effectiveness of a single link. A k-step chain, 
then, has probability a‘ of being effective. In this sense, a actually measures 
the non-attenuation in a link, a = 0 corresponding to complete attenuation 
and a = 1 to absence of any attenuation. With this model, appropriate 
weights for the column sums of C, C’, etc. are a, a’, etc., respectively. 

We have noted previously that the quantity a depends upon both the 
group and the context; we now examine this notion in greater detail. Suppose 
that our interest is in the communication problem of transmission of inform- 
ation or rumor through a group. It is quite evident that different groups 
will respond in different ways to the same information and, also, that a 
single group will exhibit different responses to various pieces of information. 
For example, the information that the new high-school principal is unmarried 
and handsome might occasion a violent reaction in a ladies’ garden club 
and hardly a ripple of interest in a luncheon group of the local chamber 
of commerce. On the other hand, the luncheon group might be anything 
but apathetic in its response to information concerning a fractional change 
in credit buying restrictions announced by the federal government. 

Some psychological investigations have been directed at exactly this 
point. It is possible that these, or subsequent studies, may reveal that a 
is or is not relatively constant among all existing links in a group with respect 
to a particular context. If it should appear that a is not relatively constant, 
it will be necessary to consider more complicated models. For present pur- 
poses, we shall assume a is relatively constant and that, either by investigation 
or omniscience, its value is known. 

Let s; be the sum of the jth column of the matrix C and s a column 
vector with elements s; . In the example above, e.g., the row vector s’ = 
(2, 1, 1, 3, 1, 4). We wish to find the column sums of the matrix 


T=aC+aC?+---+a'C'+--- =(1-—al)" — 1. 


T has elements ¢;; and column sums ¢; = >_; ¢;; . Let ¢ be a column vector 
with elements ¢; and u be a column vector with unit elements. Then t’ = 


u'((I — aC)" — I]. 
Multiplying on the right by (J — aC) we have 
(I — aC) = uw’ — wT — a) = aC," 
and by transposition, 
(I — aC’)t = aC’u. 


But, C’u is a column’ vector whose elements are the row sums of C’, i.e., 
the column sums of C; therefore C’u = s. Finally, dividing through by a, 
we have 


(Ly = c’)e = 8. 
a 

















42 PSYCHOMETRIKA 
Thus, given a, C, and s, we have only to solve the system of linear equations 
above to obtain ¢. Actually, we compute no powers of C although our original 
summation was over all powers. The process breaks down in case 1/a is not 
greater than the largest characteristic root of C. (See 5, 168). Some experi- 
ence with computations indicates that reasonable, general-purpose values of 
1/a are those between the largest root and about twice that root. It is evident 
that the effect of longer chains on the index will be greater for smaller values 
of 1/a. Finally, it is a real advantage in computations to choose 1/a equal to 
an integer. In the numerical example of the following section, the largest root 
is less than 1.7 and 1/a is taken equal to 2.0. There is an extensive literature 
on bounds for such roots; in this connection, see the series of papers by 
A. Brauer (6). For matrices of non-negative elements, a simple upper bound 
for the largest root is the greatest row (column) sum; this bound is attained 
when all row (column) sums are equal. For the solution, several abbreviated 
methods of computation are available. See, e.g., Dwyer (4). 

The usual index of status is obtained by dividing the column sum s; by 
n — 1, the number of possible choices. Using the same notion, we obtain as 
divisor of the ¢;, with (n — 1) = (n — 1)(n — 2) +++ (n — kh), 


m=an—l1)+a(n—1)°? +am—-—1)%4+--- 
= (n — 1)!a""'e’’*, approximately.* 


Finally, then, the new-status index vector is given by (1/m)t, where ¢ is the 
vector solution to the system of equations above. 


A Numerical Example 


We shall consider the example of the group of six persons whose choice 
matrix is given at the beginning of the paper. For this group, conventional 
technique of dividing column sums by n — 1 = 5 produces the 


Conventional Status Vector = (.4, .2, .2, .6, .2, .8). 


Going beyond the surface question of “How many choose X?” to the 
deeper question of ‘‘Who choose X?” reveals certain important features of 
this artificially constructed group. F and D are, apparently, of highest status. 
A, however, is chosen by both of these though he is not chosen by any of 
the “small fry” in the group. Is not A’s status higher than is indicated by 
the conventional computation? 

Secondly, the positions of the three low-status persons are not identical. 
B and C choose each other and are chosen by no one else in the group. E, 
on the other hand, has contact with the rest of the group through D and is 
in a somewhat different position than B and C. 


*The approximation improves with increasing n. Therelativeerror < 1/[a"~#(n — 2) !e!/¢], 
For example, when n = 25, a = }, the relative error <4 X 107". 























LEO KATZ 43 


Other features might be pointed out, such as that F’s choice of D is 
not reciprocated, etc. But this is enough to illustrate the well-known de- 
ficiencies in the conventional computations. We pass now to actual computa- 
tion of the vector ¢. 

We first write out the required equations, using a = 1/2 for simplicity. 
The coefficients of ¢, , t2 , --- , ts are the negative of the transpose of C plus 
1/a = 2 added to each principal diagonal term. The equations are 


2t, ~~ i — t=2 
2t2 — ts = 1 

— t, + 2; = 1 

—- i+24—- tt- &£ =3 

— t& + 2, = 1 

=—&— 4-— & — t;+ 2t, = 4, 


and the resulting values of ¢, , --- , ¢s are 13, 1, 1, 11.4, 6.2, and 12.6, res- 
pectively. The approximate computation of m = 27.71 agrees fairly well, 
even here with n = 6 only, with the exact value of 26.25. Dividing the t; by 
27.71 gives the 


New Status Vector = (.47, .04, .04, .41, .22, .45). 


Comparison of the new with the conventional computation above indicates 
that every change is in the appropriate direction to overcome the short- 
comings in the index pointed out previously and the new status indices are 
in much more nearly correct relative position. 


REFERENCES 


. Jennings, H.H. Leadership and isolation. New York: Longmans, Green, 1943 and 1950. 

2. Forsyth, E., and Katz, L. A matrix approach to the analysis of sociometric data: 
preliminary report, Sociometry, 1946, 9, 340-347. 

3. Festinger, L. The analysis of sociograms using matrix algebra. Human Relations, 1949, 
2, 153-158. 

4. Dwyer, P.S. Linear computations. New York: Wiley and Sons, 1951. 

5. Ferrar, W. L. Finite matrices. London: Oxford Univ. Press., 1951. 

6. Brauer, A. Limits for the characteristic roots of a matrix. Series of four papers in 

Duke Mathematical Journal, I: 1946, 13, 387-395; II: 1947, 14, 21-26; III: 1948, 15, 

871-877; IV: 1952, 19, 75-91. 


— 


Manuscript received 7/3/52 


Revised manuscript received 8/8/52 























PSYCHOMETRIKA—VOL. 18, No. 1 
MARCH, 1953 


A FACTOR ANALYSIS OF INTRA-TASK PERFORMANCE 
ON TWO PSYCHOMOTOR TESTS 


Epwin A. FLEISHMAN* 


USAF AIR TRAINING COMMAND 
HUMAN RESOURCES RESEARCH CENTER 


There are indications that even during the short time of administration 
of a single psychomotor test, the ability or abilities sampled may shift materi- 
ally in importance. It then becomes important to know the stages in which 
these fluctuations occur, the stage at which the test is most complex, and 
the stage at which the test most nearly measures one ability at a time. This 
paper describes an application of factorial methods to this problem. Factor 
analysis of inter-trial correlations on two models of the Rudder Control 
Test revealed three factors, ‘‘Steadiness-Control,” ‘Precision of Movement,” 
and ‘“Strength.””’ The same factor pattern was confirmed in a separate 
factor analysis on another sample in which the order of administration of 
the tests was reversed. Implications are pointed out for future psychomotor 
test development. 


Introduction 


It has generally been found that psychomotor tests, such as those used 
in the Air Force classification program, are factorially complex; that is, they 
tend to measure several abilities. For this reason the apparatus tests cur- 
rently in use in the Aircrew Battery correlate quite highly with each other 
and also with other tests in the battery (2, 3). Although these tests possess 
considerable validity for pilot selection, higher over-all validity would be 
contributed to the total battery if the individual psychomotor tests had 
extremely low correlations with each other, even if the tests had somewhat 
lower individual validities. While test development in other aptitude areas 
is increasingly being aimed at the sampling of one ability category at a time 
through insights provided largely by factor analysis, little attempt in this 
direction has been made in developing psychomotor tests. The implications 
for developing such psychomotor tests would be especially important for 
classification, since one could avoid having to weight into various composite 
aptitude indices invalid variance along with the valid variance found to 
exist in complex tests. 


*Perceptual and Motor Skills Research Laboratory, Lackland Air Force Base, 
San Antonio, Texas. The opinions or conclusions contained in this report are those%of 
the author. They are not to be construed as reflecting the views or indorsement of the 
Department of the Air Force. 

There is an assumption here that it is possible to break down the variance in per- 
formance of complex psychomotor tasks into simpler, more fundamental psychomotor 
functions. Whether i introduction of complexity of function (e.g., tasks requiring 
coordination or integration of operations) introduces variance not reproducable by any 
number of more analytical tasks remains a problem for future research. 


45 














PSYCHOMETRIKA 


There is also some indication that even during the short time period of 
the administration of a single psychomotor test, the ability or abilities 
sampled may shift materially in importance. If this is true, then it becomes 
important to know (1) at what stages in the task these systematic changes 
in function occur, (2) at which stage the task is most complex in the number 
of abilities measured, and (3) at which stage the task is most nearly measuring 
one ability at a time. 

Factor analysis methods would appear to provide one of the most 
fruitful approaches to this problem. Information derived from such analyses 
may provide a basis for deriving part scores from such tests, in order to 
determine which of the factors isolated in a test are most valid. If certain 
trials are found more valid than others, it may be possible to emphasize the 
strongest factors in those trials in the design of new experimental models of 
the test. Or, as an alternative, trials yielding the highest and purest measure 
of certain factors could be scored separately and validated independently. 


Purpose 


The purpose of this paper is to describe an application of factorial 
methods to the trial scores of two forms of the Rudder Control Test, the 
Standard Rudder Control (CM120C) (3), and the Experimental Six-Target 
Rudder Control Tests (1, 4). In the Rudder Control Test, the examinee 
sits in a mock cockpit of an airplane. His own weight throws the seat off 
balance unless he applies correction by means of rudder foot-pedals. Pushing 
the right rudder pedal causes the apparatus to swing to the right, and pushing 
the left pedal causes it to swing to the left. The examinee’s task is to keep 
the cockpit pointed directly at one of three target lights situated on a panel 
before him. The task seems to require a keen appreciation of loss of balance 
and a quick but not over-controlled correction made by leg action. 

The Experimental Six-Target Rudder Control Test involves the same 
apparatus as the Standard Model, except the examinee is provided with a 
panel of six target lights to which he must successively shift the apparatus 
as each is presented. 

These tests were selected for several reasons. First, the Standard 
Rudder Control Test generally has been found the most valid single test in 
the Aircrew Battery. On the other hand, little is known about its factorial 
content. Previous factor analyses of the Aircrew Battery have revealed it 
to have relatively low communality with respect to the other Aircrew tests 
(2, 3). Moreover, administration conditions of the test are such that more 
than one factor is suspected. In addition, it has been shown in a previous 
study (1), that the correlation between the Standard Rudder Control Test 
and the Six-Target Rudder Control Test depends somewhat on which test is 
given first. The question arises as to the extent to which the change in corre- 








EDWIN A. FLEISHMAN 47 


lation due to administration order can be explained by systematic changes 
in the abilities involved at various stages of performance on each task. 


Method 


The Standard Rudder Control Test and the Experimental Six-Target 
Rudder Control Test were administered to 698 pilot-cadets. In 356 cases, 
the Standard Model was administered before the Experimental Model. In 
342 cases, the Experimental Model was administered first. 

The eight-minute testing period for the Standard Model was divided 
into six one-minute trials and separated by 30-second rest periods. This is 
the standard operating procedure followed in all operational administrations 
of this test. The Experimental Model Test period was divided into four 
two-minute trials with 30-second rest periods. Separate scores were recorded 
at the end of each trial for each of the tests. 

The score derived from the Standard Model was the total time the 
apparatus is held on target by the subject. The score derived from the 
Experimental Six-Target Model was the number-of-targets achieved by the 
subject. This test is of the self-pacing type, in which the subject must shift 
the apparatus to as many successively presented targets as possible within 
the testing period. He must hold the apparatus on target steadily for three 
seconds before a new target is presented.* 

Correlations among all the trial scores of both tests were obtained. 
Separate matrices were obtained for each administration order of the two 
tests. Each inter-trial correlation matrix was then factor analyzed by 
Thurstone’s Centroid Method, and the axes rotated to simple structure. 


Results 


Tabie 1 presents the intercorrelations of trial scores for both tests when 
the Standard Model was given first. Table 2 presents a similar matrix for 
the sample in which the Six-Target Model was given first. 

Separate factor analyses were made for each of the two correlation 
matrices. 


Factor Analysts of Trial Intercorrelations when the Standard Test ts 
Administered First 


Four factors were extracted from the matrix of inter-trial correlations 
obtained when the Standard Rudder Control Test was administered before 


*In a previous study (1), a time-on-target score was also derived from the Six- 
Target Test. The correlation between number-of-targets and time-on-target scores was 
91. Since the reliability of the number-of-targets score proved higher, this score has 
been used as the measure of performance on the test. 

tThe correlation between the total scores of both tests was .58 when the Standard 
Test was given first and .44 when the Six-Target Test was given first. Although these 
differences are not as marked as was previously found on a smaller sample (1), the differences 
in the correlations for the two administration orders is in the same direction and still statisti- 
cally significant. 








48 PSYCHOMETRIKA 


TABLE 1 
Intercorrelations Between Trial Scores of the Standard and Six-Target Rudder Control 
Tests when the Standard Model is Administered First* 














(N = 356) 
Test Trial 1 ~ 4 5 6 , @& ®» 20 
1. Standard Rudder Control 1 74 64 7 57 48 43 35 33 29 
2. Standard Rudder Control 2 74 73 71 58 61 41 35 35 29 
3. Standard Rudder Control 3 64 73 66 62 56 45 40 40 33 
4, Standard Rudder Control 4 70 71 66 77 67 +55 50 49 46 
5. Standard Rudder Control 5 57 58 62 77 71 54 50 50 49 
6. Standard Rudder Control 6 48 51 56 67 71 52 50 52 48 
7. Six-Target Rudder Control 1 43 41 45 55 54 52 82 75 70 
8. Six-Target Rudder Control 2 35 35 40 50 50 50 82 1 7 
9, Six-Target Rudder Control 3 33 35 40 49 50 52 75 79 76 
10. Six-Target Rudder Control 4 29 29 33 46 49 48 70 73 76 
*Decimal points are omitted. 
TABLE 2 


Intercorrelations Between Trial Scores of the Standard and Six-Target Rudder Control 
Tests when the Six-Target Model is Administered First* 


; (N = 342) 








Test anak Cd dl 2] St Be TFT BO DO 





1. Six-Target Rudder Control 1 85 76 71 31 20 12 31 26 30 
2. Six-Target Rudder Control 2 85 89 85 35 25 21 38 30 30 
3. Six-Target Rudder Control 3 76 «89 90 39 31 29 46 36 35 
4, Six-Target Rudder Control 4 71 85 90 41 34 30 44 34 37 
5. Standard Rudder Control 1 31 35 39 41 49 53.53 49 48 
6. Standard Rudder Control 2 20 25 31 34 49 52 50 44 39 
7. Standard Rudder Control] 3 12 21 29 30 53 52 52 46 35 
8. Standard Rudder Control 4 31 38 46 44 53 50 52 62 50 
9. Standard Rudder Control 5 26 30 36 34 49 44 46 62 49 
10 6 


. Standard Rudder Control 30 30 35 37 43 39 35 50 49 





*Decimal points are omitted. 


the Six-Target Rudder Control Model. Table 3 presents the centroid factor 
matrix obtained. Table 4 presents the orthogonal solution of rotated factor 
loadings obtained using the criteria of simple structure and positive manifold. 





Interpretation of the Factors 
Factor I derives its highest loadings from the four trials on the Six- 
Target Rudder Control Test. These trials seem to measure this factor 
uniquely. In these trials, it will be recalled, the subject must shift the 











EDWIN A. FLEISHMAN 49 


TABLE 3 


Centroid Factor Loadings of Intra-Task Performance on the Standard and Six-Target 
Rudder Control Tests, when the Standard Model is Administered First * 

















Factors 

Test Trial h? 

I II III IV 
1. Standard Rudder Control 1 7 ‘41 —25 18 75 
2. Standard Rudder Control z= 72 44 —26 11 79 
3. Standard Rudder Control 3 73 35 -17 14 71 
4. Standard Rudder Control 4 83 30 11 14 8k 
5. Standard Rudder Control 5 80 21 27 08 76 
6. Standard Rudder Control 6 75 16_ 29 19 71 
7. Six-Target Rudder Control 1 79 —39 12 —16 82 
8. Six-Target Rudder Control 2 76 —47 07 —09 81 
9. Six-Target Rudder Control 3 75 —46 |} 08 —18 81 

10. Six-Target Rudder Control 4 70 —45 13 —12 7. 
a?/k 57 14 04 02 
*Decimal points are omitted. 
TABLE 4 


Rotated Factor Loadings of Intra-Task Performance on the Standard and Six-Target 
Rudder Control Tests, when the Standard Model is Administered First* 














Factors} 
h2 
Test Trial I II III IV 
PM SC Ss R 
1. Standard Rudder Control i 36 78 07 12 7 
2. Standard Rudder Control 2 38 80 03 06 79 
8. Standard Rudder Control 3 43 71 ll 09 71 
4. Standard Rudder Control 4 58 61 33 00 81 
5. Standard Rudder Control 5 63 45 40 —08 76 
6. Standard Rudder Control 6 58 40 46 01 71 
7. Six-Target Rudder Control 1 “91 3-01 0 —01 82 
8. Six-Target Rudder Control 2 90 05 —09 00 81 
9. Six-Target Rudder Control 3 90 05 —06 01 81 
10. Six-Target Rudder Control 4 85 09 01 03 7 
a*/k 47 25 05 00 





*Decimal points are omitted. 
+Factors are identified as follows: I. Precision of Movement; II. Steadiness-Control; III. Strength; 
IV. Residual. 








50 PSYCHOMETRIKA 


apparatus to as many successively presented targets as possible during the 

testing period. The primary task, then, is to move the apparatus as precisely 

as possible with controlled movements of the foot pedals. For this reason, 

this factor has been labeled ‘‘Precision of Movement Under Speed Conditions.”’ 

This interpretation seems to be confirmed by the relatively high but some- 

what lower loadings on the factor evidenced by the last three trials of the 

Standard Rudder Control Test. This seems plausible since the standard 

administration condition of this test changes at the beginning of trial 4. 

In trials 4 through 6, the task changes from a single-target task to a three- 

target task, in which the subject must not only hold the apparatus on target, 

but must also shift to one or the other of three targets when they are presented 

during each trial. The relatively low but significant loadings of the first 

three trials of the Standard Test on this factor may be explained by the fact 

that at the start of each of these trials, the subject must still move the apparatus 
from a side position to the single center target when it is presented. The, 
extent to which he can do this precisely thus contributes in some degree to 

the total time the apparatus is held on target during each of these trials. 

Factor II is common only to the six trials of the Standard Rudder Control 
Test. Moreover, the highest loadings on this factor are derived from the 
first three trials of this test. In these trials, the subject’s only task, once 
he gets on target, is to hold the apparatus steadily on target. For this reason, 
this factor has been called “Steadiness-Control.’’ This interpretation is also 
supported by the substantial but somewhat lower loadings on this factor of 
the last three trials of the Standard Test. Although in these trials the subject 
is presented successively with three targets to which he must shift the appara- 
tus as each appears, he is still required to hold the apparatus on each target 
from 7.5 to 27.5 seconds before a new target can be presented. 

Moreover, this factor does not appear in the four trials of the Experi- 
mental Six-Target Test. Since in this test it is necessary for the subject to 
hold the apparatus on target only three seconds before a new target is pre- 
sented, further support is obtained for the interpretation that this factor 
represents steadiness of some kind. 

Factor III is common only to the last three trials of the Standard Rudder 
Control Test. Interpretation of this factor is thus somewhat more difficult. 
There is reason to believe, however, that this factor represents a strength 
function. In these three trials the subject is presented with three targets. 
When either one of the side targets is presented, the subject must hold the 
apparatus steadily at a rather difficult angle. This involves considerable 
muscular tension, especially in the legs, in order to keep the apparatus lined 
up in these positions. It will be recalled that during these three trials these 
positions must often be maintained as long as 27.5 seconds on certain settings. 
Moreover, the burden of keeping the apparatus on target is concentrated 
mostly on one leg at a time in these positions. The combination of such 








EDWIN A. FLEISHMAN 51 


long-time delays together with side target angles is a task situation which 
appears only in these three trials. It was thus decided to call this factor 
“Strength.” 

Factor IV is a residual factor, containing only insignificant loadings. 


Factor Analysis of Trial Intercorrelations when the Six-Target Test is 
Administered First 
A similar analysis was made of the matrix of inter-trial correlations 
(Table 2) obtained when the Six-Target Rudder Control Test was admin- 
istered before the Standard Model. This was done independently of the 


TABLE 5 


Centroid Factor Loadings of Intra-Task Performance on the Standard and Six-Target 
Rudder Control Tests, when the Six-Target Model is Administered First* 














Factors 

Test Trial h? 

I II III IV 
1. Six-Target Rudder Control 1 68 —52 —20 —19 81 
2. Six-Target Rudder Control z 77 —55 —09 —13 92 
3. Six-Target Rudder Control 3 82 —47 12 —16 93 
4, Six-Target Rudder Control 4 81 —42 16 —13 88 
5. Standard Rudder Control if 65 30 13 14 55 
6. Standard Rudder Control 2 58 35 17 11 50 
7. Standard Rudder Control 3 56 42 26 11 57 
8. Standard Rudder Control 4 72 33 13 16 66 
9. Standard Rudder Control 5 64 36 20 12 60 
10. Standard Rudder Control 6 58 24 19 is ... 45 

a?/k 47 17 03 02 





*Decimal points are omitted. 


first analysis. Table 5 presents the centroid factor matrix obtained. Again, 
four factors were extracted and the axes rotated to simple structure. Table 6 
presents the orthogonal solution of rotated factor loadings. 

It can be seen that this analysis has confirmed the same factor pattern 
as was obtained in the previous analysis. Factor I is the “Precision of Move- 
ment” factor with the strongest loadings on the four trials of the Six-Target 
Test. Factor II is ‘Steadiness-Control” and again is common only to the 
Standard Test, with highest loadings on the first three trials. Factor III 
is the factor originally called “Strength” and again is found only in the last 
three trials on the Standard Rudder Control Test. Factor IV is a residual 
factor, although a low-order doublet appears in the loadings of the first 
two trials of the Six-Target Model. This may be an “initial adjustment” 








52 PSYCHOMETRIKA 


TABLE 6 


Rotated Factor Loadings of Intra-Task Performance on the Standard and Six-Target 
Rudder Control Tests, when the Six-Target Model is Administered First* 














Factorst 

Test Trial I II III IV h? 

PM SC s R 
1. Six-Target Rudder Control 1 82 01 03 38 81 
2. Six-Target Rudder Control 2 92 02 —02 27 92 
3. Six-Target Rudder Control 3 96 03 02 —01 93 
4. Six-Target Rudder Control 4 94 01 08 02 88 
5. Standard Rudder Control 1 35 65 04 02 55 
6. Standard Rudder Control 2 26 66 02 —03 50 
7. Standard Rudder Control 3 20 73 —04 —07 57 
8. Standard Rudder Control 4 41 63 31 06 66 
9. Standard Rudder Control 5 35 58 37 03 60 
10. Standard Rudder Control 6 36 47 31 07 45 

a*/k 40 23 03 02 





*Decimal points are omitted. 
{Factors are identified as follows: I. Precision of Movement; II. Steadiness-Control; III. Strength; 
IV. Residual. 


factor of some kind present in this more difficult test when it is given first. 
On the basis of this limited evidence, however, no additional significance 
can be attributed to this factor. 


Discussion 


The results have indicated that three primary factors account for intra- 
task performance on the Standard and Six-Target Rudder Control Tests. 
Moreover, the same factor pattern is found for the two tests regardless of 
their order of administration. The Six-Target Test appears least complex 
factorially, with practically all the variance accounted for by the “Precision 
of Movement” factor. On the other hand, the Standard Test is factorially 
complex, and contains all three factors. The last three trials of this test 
are the most complex, with substantial loadings on all three factors. The 
first three trials on this test provide the best measure of the ‘‘Steadiness- 
Control” factor, but also have some variance contributed by the “Precision 
of Movement”’ factor. 

The results also show that the communalities of the Six-Target Test 
remain high for each administration order, but the communalities of the 
trials of the Standard Test are somewhat lower when this test is administered 
- second. This unexplained variance suggests that additional factors may 
still be present in the Standard Test under these conditions. 








EDWIN A. FLEISHMAN 53 


The correlation between the total scores on each of the two tests appears 
primarily explainable in terms of the only factor common to the two tests, 
Precision of Movement. The fact that a lower correlation is obtained when 
the Standard Model is presented second is partially attributable to the lower 
loadings of the Standard Model trials on this factor when it is given second and 
to the additional specific variance which appears in this administration 
order. Since the factor pattern is the same for each trial in each administra- 
tion order, the correlation change is not attributable to the appearance of 
different combinations of these factors at different stages in each task situation. 

These results have certain implications for future test development. 
Since the Rudder Control Test has generally been found to be the most 
valid test in the Aircrew Battery, it may now be possible to find out just 
what it is in the test that contributes most of the valid variance. It may be 
found that each of these components is differentially valid in predicting 
pilot success, or that practically all the valid variance is contributed by only 
one of the factors. If the latter is true, then it would be important to exclude 
from the test those factors which are not valid, and to emphasize the measure- 
ment of the valid factor at the expense of the less valid ones in future models 
of the test. This could presumably lower the correlation of the test with 
other tests in the battery and could exclude much of the invalid variance 


that is now weighted into the composite stanine along with the valid variance _ 


in the test. 

If, on the other hand, more than one factor were found to contribute 
valid variance, then separate sub-test scores could be derived for each factor 
and each separate score could be weighted appropriately. For example, 
scores on the first three trials of the Standard Test would be the best available 
measure of the ‘‘Steadiness-Control” factor. Scores on the four trials of the 
Experimental Six-Target Rudder Control Test give the strongest and most 
pure measure of the ‘‘Precision of Movement”’ factor found in the Standard 
Test. After the effects of the other two factors are partialled out, the last 
three trials of the Standard Test would provide the best measure of the 
“Strength” factor. 

Another possibility would be to include criterion scores in the correlation 
matrix of trials, and to repeat the factor analyses. The loadings of the cri- 
terion scores on each factor should indicate the unique contribution to be 
expected to the validity from each factor. 

The study also provides suggestions for the design of new models of 
the Rudder Control Test for further experimental or operational use. It 
would be possible to design the test in order to provide for three alternative 
administration conditions, each condition emphasizing the measurement of 
one of the factors. Separate scores could be derived from each ‘‘sub-test.” 

For example, the results indicate that trials 1-3 of the Standard Test 
are the strongest measures of Factor II, Steadiness-Control. However, 


| 








54 PSYCHOMETRIKA 


these trials also have some loadings on Factor I, Precision of Movement. 
It would be possible to design the test to minimize the initial movement 
required during these trials. As the test is now constituted, the apparatus 
rests in an extreme side position when the test i» not in operation and during 
rest periods. By providing a slightly off-center rest position during the first 
three trials in which the subject’s task is to “‘stay on” on the center target, 
the loadings on Factor II would be expected to decrease, leaving a more 
pure measure of ‘‘Steadiness-Control’’ in these trials. 

Since the four trials of the Six-Target Test appear to provide the most 
pure measure of the “Precision of Movement” factor, it seems reasonable 
that six targets might be introduced in a new model of the test instead of 
the factorially complex three-target procedure now used. Administration 
of the six-target trials could be under the same self-pacing conditions as 
that followed in the experimental model, with the same short-time delays 
between new target presentations. 

With respect to providing for maximization of the “Strength” factor 
during certain trials of new models of the test, the following possibilities are 
indicated. This factor appears in the three-target Standard Test where 
the subject must often hold the apparatus for long-time delays on one or 
the other of the side targets before a new target is presented. If the hypothe- 
sis about this factor is tenable, it is conceivable that the factor might be 
maximized by requiring the subject to hold the apparatus on more widely 
spaced side targets for somewhat longer time delays. This could easily be 
provided for by incorporating this procedure in conjunction with the six- 
target panel suggested previously. During these trials, however, only the 
extreme targets would be used and the time delay would be longer (e.g., 
20-28 seconds). In addition, increasing the spring tension on the foot pedals 
would presumably give further emphasis to this factor. 

It would be possible for a re-designed test to combine these features 
by providing for three trials in which a center target only is used, three 
trials in which the six targets are used under self-pacing conditions, and 
three trials in which only the extreme side targets are used with long-time 
delays. An apparatus of this type would allow further confirmation of the 
factors isolated in this study, if a factor analysis of performance on the new 
apparatus increased or reduced trial factor loadings in the expected directions. 
Moreover, each sub-task would presumably be in a more factorially “pure”’ 
form for further validation study against pilot success. 

From a methodological point of view, the study indicates that the appli- 
cation of factorial methods to intra-task performance is another fruitful 
approach to the study of the nature of aptitude tests in the psychomotor area. 
Although factor analysis methods have been more recently utilized in de- 
fining homogeneous sub-tests among items within printed tests, little applica- 
tion has been made of these methods to performance within apparatus tests. 











EDWIN A. FLEISHMAN 55 


The present study seems to indicate that results of such analyses can contribute 
to the isolation of important variables and can suggest leads for possible 
test improvement in this aptitude area. 

The interpretations in this study are, of course, restricted by the limited 
number of variables in the analysis. Future studies might well include 
additional variables whose factorial content is well-established. Although 
the present study was aimed at investigating factors involved during per- 
formance of a test in its operational setting, future studies might also include 
factor analyses of extended practice on psychomotor tasks. Such studies 
should lead to a better understanding of the influence of systematic changes 
in function involved at different stages in performance of psychomotor tasks. 


REFERENCES 


1. Fleishman, E. A., and Reynolds, B. Comparative data on the Standard Rudder 
Control] Test and the Experimental Six-Target Rudder Control Test. Research 
Note P&MS 52-4, Human Resources Research Center, Lackland Air Force Base, 
San Antonio, Texas, 1952. 

2. Guilford, J. P., (Ed.) Printed classification tests. AAF Aviation Psychology Program 
Research Report, No. 5. Washington: U. 8. Govt. Printing Office, 1947. 

3. Melton, A. W. (Ed.) Apparatus tests. AAF Aviation Psychology Program Research 
Report, No. 4. Washington: U. S. Govt. Printing Office, 1947. 

4. Perceptual and Motor Skills Research Laboratory. A six-target rudder control task. 
Research Note P&MS 50-3, Human Resources Research Center, Lackland Air Force 
Base, San Antonio, Texas, 1950. 


Manuscript received 7/1/52 
Revised manuscript received 8/8/52 





- 








PSYCHOMETRIKA—VOL. 18, No. 1 
MARCH, 1953 


AN APPLICATION OF CONFIDENCE INTERVALS 
AND OF MAXIMUM LIKELIHOOD TO THE 
ESTIMATION OF AN EXAMINEE’S ABILITY* 


Freperic M. Lorp 


EDUCATIONAL TESTING SERVICE 


A mathematical definition of the theoretical relation between the exam- 
inee’s actual responses to the test items and his “‘true ability” is selected. A 
maximum-likelihood solution is obtained for estimating the examinee’s 
“true ability’ from his responses to the items. The standard error of the 
maximum-likelihood estimate is obtained, its relation to the discriminating 
power of the test is pointed out, and some generalizations are drawn as to 
the optimum level of item difficulty. The Neyman-Pearson power function 
is applied to determine which of two psychological tests is the most powerful 
for the selection of ‘‘successful’’ examinees. 


When we use the usual type of mental test score to measure the ability 
of the examinees in a group, the metric supplied by the test scores cannot be 
considered satisfactory. The inadequate nature of this metric is apparent 
when we consider that two tests measuring the same ability, administered 
to the same group of examinees, may yield two score distributions of entirely 
different shapes. The metric of “true” scores obtained from very long tests 
is as subject to this objection as is the metric of fallible scores obtained from 
short tests. We here propose to use a more adequate metric for measuring 
the ability underlying the test score—a metric that will remain invariant 
from test to test—and to investigate what may be learned from a maximum- 
likelihood approach to the problem of estimating the examinee’s ability, as 
defined by this metric, and from certain other related approaches of modern 
statistical theory. The reader is warned that, in view of the heavy (but not 
insuperable) computational difficulties in the way of any practical applica- 
tion, the present discussion is directed chiefly towards determining what 
conclusions of general theoretical significance can be drawn from a con- 
sideration of the proposed metric. 

If the response of an examinee to each of n test items may be scored either 
0 or 1, we may denote the score of examinee a on item 7 by 2,, (¢ = 1,2,---, 
n;a = 1, 2, ---, m), where z;, equals either 0 or 1. For any effective test, 
the logic of the practical situation implies some relation, in general, between 
the probability that z;, = 1, which we will denote by Prob (x;, = 1), and 
Prob (x;, = 1), where 7 denotes some item other than item 7; and also some 


*The author is indebteded to Dr. John W. Tukey for helpful comments on a draft 
of the present manuscript. 


57 








58 PSYCHOMETRIKA 


relation between Prob (z;, = 1) and Prob (x;, = 1), where b denotes some 
examinee other than examinee a. If these relations can be suitably specified 
mathematically, we can apply Fisher’s method of maximum likelihood 
(25, 1383-142; 16, 152-161) so as to obtain from the data on any actual set 
of examinees’ answer sheets maximum-likelihood estimates of the para- 
meters describing the test items and of the parameter describing the ability 
of each individual examinee. 

Once the parameters to be estimated have been satisfactorily specified 
and maximum-likelihood estimates for these parameters derived, a large 
body of standard mathematical theory can be brought to bear on many 
unresolved problems in testing. The maximum-likelihood estimate of the 
examinee’s ability itself constitutes an answer to the question of how the 
items should be weighted in obtaining the examinee’s total score. The 
discriminating power of the test at different ability levels can then, in large 
sample theory, be measured by the usual standard error of the maximum- 
likelihood estimate. The examinee’s responses to the test items may be 
used to set up a confidence interval (16, Ch. 11) within which the examinee’s 
true ability may be assumed to lie. The length of this confidence interval 
provides a measure of the test’s discriminating power at a given level of actual 
test score. If it is desired to build a test that will have maximum discriminat- 
ing power at a given cutting score, the problem of the optimum distribution 
of item difficulties in such a test can be reduced, in large-sample theory, to a 
question of determining for what values of the item difficulties the standard 
error of the test score is a minimum. Finally, if it is desired to test some 
hypothesis—for example that a given examinee’s true ability is above rather 
than below a given value—the Neyman-Pearson theory of testing hypotheses 
(25, 152 ff; 16, Ch. 12) can be brought to bear; for example, the power func- 
tion of any test score used to check this hypothesis can be determined and 
compared with the power functions of scores on other psychological tests in 
order to determine what sort of psychological test would be best for this 
purpose. 


I. The Case of Free-Response Items 


We wiil here assume, as have Guilford (6), Richardson (19), Mosier 
(17, 18), Ferguson (3), Lawley (9, 10), Lorr (15), Tucker (23, 24), Lord 
(13, 14), Cronbach and Warrington (2), and others, that the probability 
(P;,) that an examinee will answer an item correctly is a normal ogive func- 
tion of his ‘‘true ability” (c) in the area measured by the test: 


P.=[ N@d. (1) 
Vhi-Rice 
Ki 


Here c, is a population parameter measuring the true ability of examinee a; 








FREDERIC M. LORD 59 


h; and R,; are population parameters describing item 7, as theoretically de- 
terminable by administering the test to an infinitely large population of 
examinees; K; is a function of R; , as defined by the relation 


K, = V1—R; (2) 
(it being assumed that R; ~ 1); and the function N(c) represents the normal 
frequency function 


N@ = Ree". 


It is assumed, given the true values of h; , R; , and c, , that formula (1) for 
P;, is not altered by the presence of additional information about the per- 
formance of examinee a on items other than item 7 or about the performance 
on item 7 of examinees other than examinee a. 

These assumptions are reasonable ones for the case where the test items 
cannot be answered correctly by guessing. An empirical study (13) indicates 
that these two assumptions may be used with considerable confidence for 
certain types of test material. Known methods are available in theory 
(although they may be excessively cumbersome in practice) for determining 
whether or not any given set of item-response data is compatible with the 
assumptions made. This may be done in large sample theory by (a) esti- 
mating the unknown parameters h; , R; , and c, from the data by maximum- 
likelihood methods and (b) using chi-square to determine whether the observed 
data could reasonably be assumed to have arisen by random sampling from 
a population of the type implied by equation (1) and characterized by the 
estimated values of the parameters. 

Equation (1) may be readily obtained, if desired, from the assumptions 
that 

(a) cis normally distributed, 

(b) there is a continuous variable (/) underlying the item, 

(c) cand 2; have a joint normal distribution with the correlation R;, and 

(d) P;, is the probability that xi > h; whenc = ¢,. 


Under these assumptions, equation (1) for P;, follows directly from the 
definition of P;, given in (d). 

If the ability c is normally distributed in the population of examinees, 
it may be shown (23) that R, is the population biserial correlation between 
x; and c; that h,; is a measure of item difficulty related to the proportion (p;) 
of examinees in the population who answer item 7 correctly by the equation 


p= | NO de; (3) 


and (24) that c 7s the common factor of the tetrachoric item intercorrelations. 











60 PSYCHOMETRIKA 


Since c is the common factor of the items, it will remain invariant from 
one test to another measuring the same ability. The variable c thus provides 
for measuring the examinee’s ability the desired metric that remains invariant 
no matter what test of that ability is administered. This invariance, and the 
developments on the following pages, will hold whether or not c is normally 
distributed in the group tested. 

P;, has been called the “item characteristic function.” The relation 
between P;, and c, , as represented by equation (1), is illustrated in Figure 
1 by a selected example. 














_ 1.00 
ce 
= 
@ 
= 
© .75- 
C= 9 
oO 
@ 
— 
5 
oOo .50- 
o 
— 
°o 
~ 
— 
= 25- 
a 
o 
a 
°o 
a 
a 
0 T T T T T 
-3 -2 -| 1) I 2 3 
Ability (¢c) 
Fiaure 1 
Item Characteristic Curve When h; = —.562 and R; = .531 


II. The Maximum-Likelihood Estimates 


Suppose we administer a sample of n test items to a sample of m exami- 
nees, obtaining mn item responses, i.e., mn values of z;, . Under the as- 
sumptions made, the frequency function for each z;, may be written 


flee) = PtQi5°"*, (4) 


where Q;, = 1 — P;,. The likelihood of the sample of mn observed values 

















FREDERIC M. LORD 61 


is therefore 
co a ee ee ar oe I] Bre. (5) 
t a 


We wish to use the method of maximum likelihood in order to determine 
what values of the unknown population parameters (h; , R; , and c,) will 
maximize the likelihood of the mn observed values. This problem is distinct 
from, although related to, the usual applications of the maximum-likelihood 
method to probit analysis, as treated, for example, by Finney (4). 

We will need the following derivatives: 




















OPin_ _ Nuc 
~~” % ” 
OP ig _ ~ Nia 
OR; = ic. h:R,) x 5) (7) 
Or. ae RiNia 
— (8) 
where 
~ wea ~~ 
N, = v( K. ? 
Now the logarithm of the likelihood is 
log L = p - [tia log Pig + (1 — 2a) log Qial. (9) 
Taking the appropriate derivatives of (9), we have 
0 log L eae: - 4 E22 (1 re SNe] _— as 
ah; ae K; > Pi. ical é.. 5) (a st i, 2, a ,); (10) 
d log L _ 1 “ E ee ARN ia ms (1 ag Lia) (Ca ae oy 
OR; Pe a rr. Qia ; 
(¢=1,2,...,2)3 (11) 
0 log L = | SelM mA (1 ean = 
= py KP., K.0.. /. wnt &...,0& 


The next step is to set equations (10), (11), and (12) each equal to zero, 
placing a circumflex (~) over those symbols that involve the population 
parameters, h; , R; , and c, , to indicate that these parameters have been 
replaced by the sample estimates h; , R; , and ¢,. For example, letting 


5, a alee (13) 
1 — R; 


0 








62 PSYCHOMETRIKA 


we will write 
Nia ae N(Gia); (14) 


a 


Sm | N(© ae. (15) 


“dia 


From equation (11) we may therefore write 


> sulle _(a- steNe | 


- re Qia 


—-AR. > Ee ae | Den zulN | =0. (16) 
ol Py Qia 
It may be seen from equation (10) that the term under the second summation 
sign in (16) vanishes. 
Using >°,: and >>;. to denote summation over all items for which z;, = 
1 or 0, respectively, for a given a, and using >>,: and >)» to denote summa- 
tion over all examinees for whom z;, = 1 or 0, respectively, for a given 7, we 
may rewrite equations (10), (16), and (12), respectively, as follows: 








A 


yy oN se at —S :. (i 


~ 
a? P a’ : 
ia ia 


RN; RN, 
tA ——— , a=1,2,...,m). (19) 
i? ‘G ae EG. ( 





A Serres |: (18) 





Equations (17), (18), and (19) theoretically can be solved so as to deter- 
mine the desired values of é, , h; , and R;. The scale for é, may be determined 
by imposing the conditions that >>, é, = 0 and >>, ¢ = m. The value of 
é, is the maximum-likelihood estimate of c, , the measure of the examinee’s 
ability. In particular, if c, is normally distributed for any given group of 
examinees, then h; is the maximum-likelihood estimate of the measure of 
item difficulty defined by equation 3, and R, is the maximum-likelihood 
estimate of the biserial correlation of the item with ability. Unfortunately 
in practice both m and n are usually moderately large numbers, so that any 
iterative solution of equations (17), (18), and (19) for values of é, , h; , and 
R; will usually be lengthy by ordinary methods. 

The item difficulties routinely obtained in item analyses are, of course, 
approximations to the values of p; , and approximations to the values of 
h; may thus be readily obtained by means of equation (3). Similarly the 
item-test biserial correlations obtained in item analysis may be adjusted to 
obtain approximations to the values of R; . Ways of refining these ap- 
proximations may suggest themselves (13). In any case, if we assume that 








FREDERIC M. LORD 63 


the values of h; and R; for a given set of items have already been determined 
to an adequate approximation by a prior investigation, it would be feasible 
in actual practice to estimate the measure of a given examinee’s ability from 
his responses to the test items, using some method of successive approxima- 
tion to solve equations (18) for é, . 


III. The Special Case of Equivalent Free-Response Items 


In this and the following section, we will limit consideration to the 
special case where all items are “equivalent,” i.e., are of equal difficulty and 
are equally correlated with the ability measured. We will assume that h; 
and R; are known, at least to an adequate approximation. Since we will 
consider only one examinee at a time, we will discontinue use of the subscript 
a. 

In this special case, only x;, need be kept under the summation sign in 
equation (12); consequently, dropping the subscript 7 from every symbol 
except x; , we have 


dlogL RN (3 n — ‘), (20) 


where s = \ x; is the number of items answered correctly, i.e., where s 
is the usual test score. Replacing Q by 1 — P, we find from (20) that 





d log L _ RN(s — nP) | 
a KPQ (21) 
If we set (21) equal to zero, we find 
~  § : ri 8 
P= -* i.e., - N(c) de = - (22) 
K 


In the special case of ‘‘equivalent’’ items, we have thus found that ¢, the mazi- 
mum-likelihood estimate of the examinee’s ability, 1s a simple function of the 
usual type of test score, s. The value of ¢ may be readily determined in any 
given case from equation (22) with the help of any standard table of areas 
under the normal curve. 

Figure 2 presents for illustrative purposes the relation between é and 
s/n for two tests composed of ‘equivalent’ items of 50-per-cent difficulty, 
the value of R being .30 in one case, .60 in the other. For example, if an 
examinee answers 60 per cent of the items correctly on the former test, our 
estimate of his c-score is .81. This means that we estimate that his ability 
would place him .81 standard deviations above the mean in the basic group 
for which the values of h and R were calculated. 








64 PSYCHOMETRIKA 


The curves in Figure 2 relate the c-score metric with the metric provided 
by the usual test scores. In the case where R = .30, the relation of these 
two scales is practically linear between 6 = —2.5 and é = + 2.5 (scores 
beyond + 2.5 will be obtained by only about 1 per cent of the examinees if 
c is normally distributed in the basic group). In general, however, as illus- 





3 


€- SCORE 
° 


-2+ 

















-3 r r + 
.¢) 2 4 § 


8 
PROPORTION OF CORRECT ANSWERS (s/n) 
FIGURE 2 
Ability Score (@) as a Function of the Proportion of Correct Answers (s/n) on Tests 
Composed of n ‘“Equivalent’’ Free-Response Items of 50-Per-Cent Difficulty Having 
Specified Values of # 


trated by the curve for the case where Rk = .60, the difference between scores 
(s/n) of .50 and .60 represents less difference in underlying ability than does 
the difference between scores of .80 and .90. This fact is the result of the 
squeezing of the score scale at the extremes, arising from the impossibility 
of obtaining scores below 0 or above 1.00. 











FREDERIC M. LORD 65 


A further interesting theoretical result about the test score is apparent 
upon examination of the frequency function of x,, , Y2., *** » Una for the case 
of “equivalent” items: 


f(tr60 , X20 a | Laud = es sta 


Zrtia n—Xrie 
roe 
= P*Qr"*. (23) 


It is seen from (23) that, in the case of a test composed of “equivalent” items, 
the usual type of test score (s) is a sufficient statistic (16, 151) for estimating 
the examinee’s ability (c); and hence that the test score contains all the informa- 
tion, relevant to the estimation of the examinee’s ability, contained by the exami- 
nee’s responses to the test items. 

In order to obtain a useable standard error of é for the case of “equivalent” 
items, we must agree in advance that we will never assign an infinitely large 
value of é to any examinee. When any examinee either answers all items 
correctly or answers all items incorrectly, there is no finite value of ¢, that 
will satisfy equation 22;* in such cases we will assign some arbitrary value, 
é, = 1,000 or ¢, = —1,000, forexample. When 7 is large, such an occurrence 
will be so rare that our results will not be affected by this manipulation. 

With this understanding, the sampling error of ¢, is approximately 


(16, 208 ff.) 
». | gfe ee fed |" 
SEs = 7 | w(2 ef (24) 


where the operator EF indicates that the expected value is to be taken. Sub- 
stituting the expression given in equation (4) for f(x,;) into equation (24) and 
differentiating, we obtain 


1 | EN Pa in 
S.E.. = —= | ster E(x, — P| . 25 
Vn LK*P’?Q?’ Uw ) (25) 
Now P is the expected value of 2; , and consequently E(x; — P)? = PQ, 
the variance of the binomial distribution when n = 1. 
We thus have the final result that the sampling error of our estimate 
of the examinee’s ability is, for the case of equivalent items, 


ao .. EPS sod 
S.E.; = > ° (26) 
V/n RN 
*Dr. Tukey suggests defining @ by the equation P = (s + 4)/(n + 1) in order to 


avoid this difficulty. See Tukey, J. W., and Freeman, M. F., Transformations related 
to the angular and the square root. Ann. math. Statist., 1950, 21, 607-610. 











66 PSYCHOMETRIKA 


The thought suggests itself that this standard error may be used as a 
measure of the discriminating power of the test for examinees at a given level 
of ability. As a matter of fact, the mathematical expression for S.E.; turns 
out to be identical with the reciprocal of the expression for a discrimination 
index previously developed (13) from an entirely different line of reasoning. 

S.E.; may be used to set up confidence intervals within which the true 
value of c may be assumed to lie. Since confidence intervals obtained from 
maximum-likelihood estimates are known to be asymptotically shortest, 
unbiassed confidence intervals, the length of such a confidence interval might 
well be taken as a measure of the discriminating power of the test at a given 
level of test score. Lllustrations of such confidence intervals will be given in a 
later section. 


TABLE 1 


The Standard Error of ¢ for a Test Composed of n “‘Equivalent’’ Free-Response Items, 
for Specified Values of P and R 











‘eg R S.E.; 
5 30 3.98/-Vn 
60 1.67/Vn 
Aor 6 30 4.03/Vn 
60 1.69/-Vn 
3 or .7 30 4.19//n 
60 1.76/ Vn 
2Qor 8 30 4.54/Vn 
60 1.90/-Vn 
lor 9 30 5.44/Vn 


60 2.28/-V/n 
Table 1 presents a few standard errors calculated from equation (26) for 
ilustrative purposes. The relation between P and 8.E.; for the case when 
R = .30 is also shown by the curve labeled k = @ in Figure 3. oe 
We note that the standard error of the ¢-score is proportional to VPQ/N, 
a quantity that increases as P departs from .50. For large n, the length of 
the confidence interval within which the true ability of the examinee may be 
assumed to fall will be proportional to S.E.; . We thus see, at least for large 
n, that a given examinee’s ability (c) can be estimated more accurately by ad- 
ministering items that are of 50-per-cent difficulty for examinees like him than 
by administering items at any other single difficulty level. 














FREDERIC M. LORD 67 


Similar or related conclusions have been reached by a number of writers 
(20, 19, 7, 1, 2, 21, 13, 14). Empirical evidence relating to this point has 
also been obtained (12, 22, 19). 


IV. The Case of Multiple-Choice Items 


Suppose that any examinee who does not know the answer to a multiple- 
choice item guesses at the answer with 1 chance in k of guessing correctly. 
If we denote the item characteristic function for this case by P! , we have 

Pi = P, + e. (27) 

We will give here only the results for the case of ‘‘equivalent’’ items. 
In this special case the procedures already outlined lead to the result that 
c should be estimated from the equation 


p= 1(,-2=4). (28) 


This result differs from equation (22) only in that the usual type of correction 
for guessing is applied to the test score. é may of course be calculated from 
equation (28). 

It is readily verified that, in the case of tests composed of ‘‘equivalent”’ mul- 





tiple-choice items, the usual type of test score corrected for guessing (s - ‘ — s) 


is a sufficient statistic for the estimation of the examinee’s ability (c). 
The standard error of é for large n is found to be approximately (cf. 
equation 26): 

KKVP'Q’ 
(k — 1)Vn RN 
Figure 3 illustrates this relation of S.E.; to P’ for various values of k in the 
case when R = .30. Table 2 gives certain selected values of S.E.; obtained 
from equation (29) for the same values of k and R. For every value of k in 
Table 2, two values of P’ are listed: (1) the value halfway between the chance 
success level and 1.00, and (2) the value of P’ for which 8.E.; is a minimum, 
as determined by numerical methods. It is frequently considered that the 
former value of P’ provides optimum discrimination, but present results 
indicate that the most reliable estimate of an examinee’s ability is obtained 
when P’ is somewhat easier than this. Investigations carried through by 
Cronbach and Warrington (2) and by the present writer* into other theo- 
retical measures of the discriminating power of actual test scores have led to 
the same conclusion: that optimum measurement of a given examinee’s ability 
by means of multiple-choice items requires an item difficulty level somewhat 


_ *See also Hick, W. E., Information theory and intelligence tests. Brit. J. Psychol., 
Statist. Sect., 1951, 4, 157-164, where Shannon and Weaver's information theory is applied 
to a closely related problem. 





S.E.; = (29) 





68 PSYCHOMETRIKA 




















18 
16+ 
Ceo 
” 
- 
z 
> 
le 2+ 
Zz 
@ lt 
ve 
°o 
5S 87 
c 
4 
we 
e 6+ 
=z 
o 
r4 
: 
. aa 
2+ 
° + + + + + + + + + 
° 4 2 38 4 A -) 6 7 8 » 1.0 


POPULATION PROPORTION OF CORRECT ANSWERS (P') 
FIGURE 3 


The Standard Error of é as a Function of P’ for a Test Composed of “‘Equivalent’’ k-Choice 
Items, When F is .30 


TABLE 2 


The Standard Error of é for a Test Composed of n “Equivalent”? Multiple-Choice Items, 
for Specified Values of P’ and k, when R is .30 








k P’ S.E.; 
5 6 4.88/-Vn 
5 682 4.80/-Vn 
4 625 5.14/+/n 
4 .713 5.03/+/n 
3 6667 5.64/-Vn 
3 .759 5.44/V/n 
75 6.90/ Vn 


835 6.52/-Vn 


bo bd 














FREDERIC M. LORD 69 


easier than halfway between the chance success level and 1.00. This conclusion 
may be rationalized, if desired, as being attributable to the fact that difficult 
multiple-choice items tend in general to be less valid than easy ones, since 
guessing is more often involved in answering a difficult item than in answering 
an easy one. 

V. Confidence Intervals 


It has already been pointed out that asymptotically shortest, unbiassed 
confidence intervals for estimating c for a given examinee may be set up by 
the usual methods (16, Ch. 11) from a knowledge of é and 8.E.;. When n 
is not large, more exact methods may be used, as will be illustrated very 
briefly in the following. 

The question is frequently raised as to what meaning, and how much 
meaning, we can attribute to obtained scores that are near the “ceiling” 
or the ‘floor’ of the multiple-choice test administered, i.e., to scores that are 
nearly perfect or that are close to the expected score for an examinee who 
answers all the items by random guessing. We can answer this question by 
determining a confidence interval within which the true ability of an examinee 
obtaining such a dubious score may be expected to lie. Such confidence inter- 
vals are shown graphically in Figure 4 for most of the possible scores on a 
hypothetical Test B. 

Test B is composed of forty-nine 5-choice items, each having an R = 
.44721. The frequency distribution of the item difficulties (expressed in terms 
of h,;) is roughly normal, h; having a mean of zero and a standard deviation of 
.79. The actual frequency distribution of h; is given in Table 3 (a multiple of 


TABLE 3 
Frequency Distribution of Item Difficulties (h; or p/) in Test B 











h; pt Frequency 
8V5 23 1 
6V5 27 3 
AVS 35 6 
2V5 50 9 
0 60 11 
— 275 70 9 
—4V5 85 6 
—6V5 93 3 


— 87/5 97 I 











70 PSYCHOMETRIKA 


1/5 was chosen for h; in order to facilitate computation). The proportion 
(p?) of correct answers that will be given to item 7 by a group of examinees 
whose abilities are normally distributed is related to h, by equation (3) and 
by the fact that p! = p; + q,/k (ef. equation 27). 

When c is fixed, the sampling distribution of the obtained score (s) on 
Test B is readily obtained by replacing P by P’ in equation (5) and then 
adding together all those values of LZ for which pe Xi, = 8 The relative 
frequencies for all values of s were thus calculated for fixed values of c selected 
at intervals of 0.5. The corresponding cumulative frequencies, rounded off 
to two decimal places, are shown in Table 4, where each column represents 
the cumulative frequency distribution of s for the corresponding specified 
value of c. 

Suppose now that we have administered Test B to John D., who obtained 
a score of 16. Reading across the row for a score of 15 in Table 4, we find 
that examinees whose c-score is —3.5 obtain scores of less than sixteen 74 
per cent of the time; while from the row for a score of 16 we find that examinees 
whose c-score is —2.0 obtain scores of 16 or less 24 per cent of the time. Let 
us make the assertion that John’s true ability score (c) lies between —3.5 
and —2.0. Now, sampling theory assures us (granted the validity of our 
basic assumptions) that if we make a large number of similarly derived state- 
ments, these statements will be correct at least 50 per cent (74 minus 24 per 
cent) of the time, on the average. The interval —3.5 < ¢c < —2.0 is thus a 
50-per-cent confidence interval for estimating c. 

This confidence interval is shown in Figure 4 as a horizontal line opposite 
the score value of 16. The confidence interval for each possible raw score is 
also shown, within the limits of the graph. Actually, all intervals are de- 
termined so as to run from the 25-per-cent to the 75-per-cent level, rather 
than from the 24- to 74-per-cent level as described for the sake of simplicity 
in the illustration given. The necessary 25- and 75-per-cent points were 
obtained by curvilinear graphic interpolation. The dot near the middle of 
each confidence interval indicates the value of c for which the likelihood of 
occurrence of the given value of s is greatest. This value is of course ¢, the 
maximum-likelihood estimate of c. 

These confidence intervals provide an answer to the question of what 
meaning and how much meaning to attach to the test scores—in particular 
to those near the ceiling or near the floor of the test. As previously pointed 
out, the length of the interval provides a measure of the discriminating 
power of the test at a given level of test score. Basically the length of the 
confidence interval is somewhat analogous to the “standard error of a true 
score”’ discussed in many texts (6, 414; 8, 43). 

The longest confidence interval shown in full in Figure 4 is the one for 
a score of 15. It extends roughly 1.75 units on the c-scale—from —4.0 to 
—2.25. The shortest interval is probably the one for a score of 34, having a 





FREDERIC M. LORD y fl 


TABLE 4 


Theoretical Cumulative Frequency Distribution of Scores on Test B for Selected Fixed 
Values of ¢ (all frequencies multiplied by 100) 








Selected Fixed Values of c 








Score 
(s) —4.0 -3.5 -—3.00 -2.5 -—2.00 -15 -10 -0.5 00 +05 41.0 41.5 +2.0 42.5 +3.5 +3.0 +4.0 
49 100 
48 99 95 
47 98 92 77 
46 98 92 76 50 
45 99 95 80 53 26 
44 97 87 62 31 11 
43 99 93 73 41 15 03 
42 97 85 56 24 06 01 
41 99 93 72 38 12 02 
40 98 86 56 23 05 01 
39 99 95 75 40 12 02 
38 98 89 61 26 06 01 
37 96 80 46 15 02 
36 99 93 69 32 07 01 
35 98 86 55 20 03 
34 99 95 77 41 ll 01 
33 99 91 66 28 06 
32 97 84 52 18 03 
31 99 94 75 39 10 01 
30 98 90 64 27 05 
29 99 96 83 51 17 02 
28 99 93 74 38 10 01 
27 98 88 62 27 05 
26 99 95 81 50 17 03 
25 98 91 71 37 10 01 
24 99 96 85 60 26 06 
23 98 93 77 47 17 03 
22 99 96 88 67 35 10 01 
21 99 98 93 81 55 24 05 
20 99 96 89 71 43 16 03 


19 99 97 93 81 60 31 09 01 
18 98 94 87 72 47 21 05 01 
17 95 90 80 61 35 13 02 
16 91 83 70 48 24 07 01 


14 76 63 45 24 09 02 


12 52 37 22 09 02 
11 39 25 13 05 01 


9 16 09 03 01 
8 09 04 01 


( 04 02 01 
6 02 01 

5 01 

4 


om Ww & 

















72 PSYCHOMETRIKA 





49 


_ 





mand 
ee oa 
ort 





45 








40 











35 




















25 


ne ne 
oes 
a 
_—— 
te 
—_ 
eee 
aes 
—+ 
ee 
ee 
cee nee 
re 
et 
ee ee 
a 
ae te 
ee 
oe 
a oe 
nen ee 
oe 
a oe 
re 
a 





20 t- 


Test Score (s) 





a 
ee eee 
me 
—— 
6G_ 
ee 
peat an ee 
per a ee 
oa 






































-4 -3 -2 -l 0 I 2 3 4 


True Ability Score (c) 
FIGURE 4 
Fifty-Per-Cent Confidence Interval for Estimating True Ability Score (c) from Each 
Obtained Score (s) on Test B 


length of only 0.8 units approximately. It is again apparent that the best 
discrimination is obtained at score levels somewhat above the point halfway 
between a chance score and a perfect score. 


VI. Power Functions 


Let us suppose that, having standardized our test items on some basic 
group, we wish to test the hypothesis that a certain examinee has a true 
e-score of —0.5 or better. Suppose further that we wish to set up a 50-per- 
cent critical region for testing this hypothesis. Referring to Table 4, we see 
that 50 per cent of the examinees whose c-score is —0.5 will obtain actual 
test scores of 27 or better on Test B, and 50 per cent will obtain actual test 








FREDERIC M. LORD 73 


scores of 26 or worse. We may, therefore, decide that we will administer Test 
B to the examinee and accept the hypothesis if he obtains a score of 27 or 
better, reject the hypothesis if he obtains a score of 26 or worse. 

The power function for this test of the hypothesis is the probability of 
rejecting the hypothesis, this probability being considered as a function of 
the true c-score. This function may be read off from Table 4 for selected 
values of c. It is given by the broken line in Figure 5. 











1.00 
7) 
: 
£ 80 5 
- O 
oa 
> 
ZT 607 
=o 
2 
Do 407 
Oc 
a= 
oO 
oO 7207 
@ 
eo 
0) 








True Ability Score (c) 


Figure 5 


Power Function for Test A (solid line) and for Test B (broken line) for Testing the 
Hypothesis that the Examinee’s True Ability (c) Is Greater Than —0.5 


We may use the power function to determine which of two tests measuring 
the same ability is better for the purpose of testing the stated hypothesis. 
The solid line in Figure 5 represents the power function of another test, which 
we will call Test A, composed of forty-nine 5-choice items each having R = 
44721 and h = 0. Test A is identical with Test B except that in Test A all 
items are of 50-per-cent difficulty after correction for guessing. As shown by 
the figure, Test A gives a lower probability than Test B of rejecting the 
hypothesis when it is true, and a higher probability of rejecting it when it 
is false. It is thus seen that Test A provides the more powerful test for the 
stated hypothesis. 


Discussion 


The “ability score” (é) developed here is a measure of ability that will 
remain unchanged, except for sampling fluctuations, no matter which of 
various different tests of the same ability is administered, provided always 








74 PSYCHOMETRIKA 


that the basic assumption represented by equation (1) is fulfilled. This score 
will have the usual valuable characteristics of a maximum-likelihood esti- 
mate.* In sufficiently large samples the é-score will (1) approximate closely 
to the true value of c, (2) have an approximately normal sampling distribu- 
tion, (3) have a minimum sampling error in comparison with other comparable 
statistics that might be used for estimating c. Unfortunately, we are not, 
at present, able to describe the properties of ¢ for small n, nor to state how 
many items are required to constitute a ‘large’ sample. 

It has further been shown that ¢ is a sufficient statistic when the test 
is composed of equivalent items, i.e., that the é-score contains all the infor- 
mation contained by the examinee’s responses relevant to the estimation of his 
ability. This property of the é-score holds for small as well as large samples. 

Suppose that a large number of free-response items all measuring the 
same ability have been ‘‘calibrated” on some large basic group of examinees— 
i.e., that the values of h; (difficulty index) and R; (item discriminating power) 
for each item have been determined for this group with a sufficient degree of 
accuracy. Suppose next that statistical tests disclose that the data are com- 
patible with the basic assumptions made here in relation to the item character- 
istic curves (some empirical evidence is available (13) to indicate that this 
supposition is not unreasonable, at least for certain types of test data). 
Finally, suppose that a variety of tests are built from various items selected 
from this calibrated pool, and that one such test is administered to each of a 
number of different examinees or groups of examinees. 

If the new groups tested are sufficiently similar to the basic group so 
that the item characteristic curves remain unchanged and independent, we 
can theoretically obtain for each examinee from the formulas derived here a 
é-score that will place all examinees on the same ability scale—i.e., on the 
scale on which the basic group (for which the items were calibrated) has 
a mean of zero and a standard deviation of 1. This may theoretically be done 
even though the different groups of examinees are at somewhat different 
ability levels, and even though they take entirely different tests. Unfor- 
tunately, even for free-response items, the necessary computations would in 
general be very onerous.t An exception is the special case where each test 
is composed of equivalent items. 


*This statement is not definitely known to hold for the é-score obtained in Section IT, 
although it clearly holds for the cases discussed in all other sections. In Section II the 
values of h; and #; are to be estimated simultaneously with the values of ca , whereas in 
the other sections the item parameters are treated either as known or as equal for all 
items. In Section II we are estimating m + 2n unknown parameters from mn observations 
(the mn values of z;2). The number of observations per parameter to be estimated thus 
becomes large as m and n become large, so that it seems reasonable to expect the maxi- 
mum-likelihood estimates to have the usual optimum properties. This conclusion is not 
known to have been rigorously proved, however, nor, to the writer’s knowledge, has 
such a situation even been discussed in the literature. 

TDr. Ledyard R Tucker has devised a least-squares method (unpublished) for esti- 
mating h; , R; , and cg simultaneously that is feasible for large m and moderately large n. 
Consideration is being given to the possibility of using this method in the actual scaling 
of published tests. 








FREDERIC M. LORD 75 


In the case of multiple-choice tests the application of the formulas in 
a practical situation is more problematical. Although in theory it is often 
assumed that 1/k (the probability of guessing the correct answer to an item) 
is equal to the reciprocal of the number of responses per item, actual data 
usually show that k is often less than this, and that it varies from item to 
item. 

Regardless of these practical difficulties, there is value in the theoretical 
conclusions to which the present approach leads. Included in these con- 
clusions are the following: 


(1) In the case of a test composed of “equivalent” free-response items, 
the usual type of test score is a sufficient statistic for estimating the 
examinee’s ability (c). 

(2) The standard error of é may be used as a measure of the discrimi- 
nating power of the test for examinees at a given level of ability. 

(3) Confidence intervals may be set up within which the examinee’s true 
ability (c) may be assumed to lie. The length of such a confidence 
interval may be taken as a measure of the discriminating power of 
the test for examinees at a given level of test score. 

(4) A given examinee’s ability can be estimated more accurately by 
administering free-response items that are of 50-per-cent difficulty 
for examinees like him than by administering free-response items 
at any other single difficulty level. 

(5) In the case of a test composed of “equivalent” multiple-choice 
items the usual type of test score corrected for guessing is a sufficient 
statistic for estimating the examinee’s ability. 

(6) Optimum measurement of a given examinee’s ability level by means 
of “equivalent”? multiple-choice items requires an item difficulty 
level somewhat easier than the halfway point between chance-success 
and 100-per-cent correct answers. 


An illustration has been given of the use of the Neyman-Pearson power 
function to determine which of two tests measuring the same ability will be 
most powerful for the purpose of discriminating those examinees whose 
c-score lies below some predetermined value from those examinees whose 
c-score lies above this value. 

In closing, it may be noted that the problem of estimating the c-scores 
of a group of examinees is very similar to the problem of determining latent 
structure (11, 5). Instead of estimating the shape of the distribution of 
c-scores in the group tested, as would Lazarsfeld, we here estimate the c-score 
of each individual examinee. 

REFERENCES 
1. Brogden, H. Variation in test validity with variation in the distribution of item 


difficulties, number of items, and degree of their intercorrelation. Psychometrika, 
1946, 11, 197-214. 








76 


“J 


oo 


10. 


MM. 


12. 


13. 
14. 


15. 


18. 


19. 


20. 


21. 


22. 


23. 


24. 


25. 


PSYCHOMETRIKA 


. Cronbach, L. J., and Warrington, W. G. Efficiency of multiple-choice tests as a 


function of spread of item difficulties. Psychometrika, 1952, 17, 127-147. 


. Ferguson, G. A. Item selection by the constant process. Psychometrika, 1942, 7, 


19-29. 

Finney, D. J. Probit analysis. Cambridge: Cambridge Univ. Press, 1947. 

Green, B. F. Latent class analysis: A general solution and an empirical evaluation. 
Ph.D. thesis, Princeton University, 1951. 


. Guilford, J. P. Psychometric methods. New York: McGraw-Hill, 1936. 
. Gulliksen, H. The relation of item difficulty and inter-item correlation to test variance 


and reliability. Psychometrika, 1945, 10, 79-91. 


. Gulliksen, H. Theory of mental tests. New York: John Wiley and Sons, 1950. 


Lawley, D. N. On problems connected with item selection and test construction. 
Proc. roy. Soc. Edin., 1948, 61-A, Part 3, 273-287. 

Lawley, D. N. The factorial analysis of multiple item tests. Proc. roy. Soc. Edin., 
1944, 62-A, Part I, 74-82. 

Lazarsfeld, P. F. (with 8. A. Stouffer et al.). Measurement and prediction, Vol. 4 
of studies in social psychology in World War II. Princeton: Princeton Univ. Press, 
1950, Chs. 10 and Il. 

Long, J. A., and Sandiford, P. The validation of test items. Bulletin No. 3, De- 
partment of Educational Research, University of Toronto, 1935. 

Lord, F. M. A theory of test scores. Psychometric Monograph No. 7, 1952. 

Lord, F. M. The relation of the reliability of multiple-choice tests to the distribution 
of item difficulties. Psychometrika, 1952, 17, 181-194. 

Lorr, M. Interrelationships of number-correct and limen scores for an amount limit 
test. Psychometrika, 1944, 9, 17-30. 

Mood, A. M. Introduction to the theory of statistics. McGraw-Hill, 1950. 


7. Mosier, C. I. Psychophysics and mental test theory: fundamental postulates and 


elementary theorems. Psychol. Rev., 1940, 47, 355-366. 

Mosier, C. I. Psychophysics and mental test theory. II. The constant process. 
Psychol, Rev., 1941, 48, 235-249. 

Richardson, M. W. Relation between the difficulty and the differential validity of 
a test. Ph.D. thesis, University of Chicago, 1936. Also Psychometrika, 1936, 1 
(No. 2), 33-49. 

Symonds, P. M. Choice of items for a test on the basis of difficulty. J. educ. Psychol., 
1928, 19, 73-87. 

Thorndike, R. L. Personnel selection. New York: John Wiley and Sons, 1949, pp. 
228-230. 

Thurstone, T. The difficulty of a test and its diagnostic value. J. educ. Psychol., 
1982, 23, 335-43. 

Tucker, L. R. Maximum validity of a test with equivalent items. Psychometrika, 
1946, 11, 1-13. 

Tucker, L. R. A method for scaling ability test items in difficulty taking item un- 
reliability into account. Amer. Psychologist, 1948, 3, 309-10. (Abstract) 

Wilks, S. S. Mathematical statistics. Princeton: Princeton Univ. Press, 1944. 


Manuscript received 2/26/52 
/ 


Revised manuscript received 8/29/52 








PSYCHOMETRIKA—VOL. 18, No. 1 
MARCH, 1953 


A REVISED ORTHOGONAL ROTATIONAL SOLUTION FOR 
THURSTONE’S ORIGINAL PRIMARY MENTAL ABILITIES 
TEST BATTERY 


Wayne S. ZIMMERMAN 


BRANDEIS UNIVERSITY 


By extension of the rotational process, meaningful orthogonally related 
positions were found for all of the thirteen centroid factors which Thurstone 
extracted from his original PMA intercorrelations. Most of the original 
primary ability factors were more sharply delineated and corresponded more 
closely to the Army Air Force factors that bear similar names (demonstrating 
greater invariance from analysis to analysis). While such different results 
obtained by two investigators applying the same methods on the same data 
may initiate some concern, the results strengthen rather than weaken the idea 
that more psychological meaningfulness and greater invariance will result if 
res axes are rotated, using the concepts of a simple structure and positive 
manifold. 


Introduction 


In naming the rotated factors in his early classical fifty-seven variable 
analysis, Thurstone (1) exercised caution. Of thirteen factors extracted, he 
identified only seven with assurance and two others tentatively. He did not 
try to rationalize the test loadings on axes 10, 11, 12, and 13. 

Two of the last four columns apparently were rotated toward a simple 
configuration, since both axis 10 and axis 12 meet tests of simple structure 
and positive manifold as well as, or better than, at least one of the factors 
that was identified. Residual axis 10, with a minimum value of —.077, 
contained thirty-four entries within the range of + .20, which Thurstone 
described as nearly vanishing, plus five above .40, while residual axis 12 with 
a minimum value of —.10 contained forty entries within + .20 and five 
above .40.* 

One reason why these rotated factors were not named is that in each 
instance there are grouped significantly loaded tests with no apparent func- 
tional unity. 

The abundance of variance left on these residual factors furnished one 
stimulus for seeking a revised rotational solution. Also, one of the residuals 
promised upon further rotation to represént a spatial-visualization factor— 
in addition to primary factor S (Spatial) (2, 270). Inspection of the over-all 
factor configuration suggested that if the rotational process were continued, 


*Thurstone considered loadings between -.20 as negligible and considered only 
loadings of at least .40 in naming factors. 


wir 








78 PSYCHOMETRIKA 


the reallocation of variance among the primaries and the residuals should 
yield factors that would parallel more closely similar factors in Army Air 
Force data (3). Thus it appeared that greater invariance could be demon- 
strated and that the demands of simple structure and of positive manifold 
would be met more effectively. 


Method 


Further rotations were planned using the final published values as a 
starting point. Values were plotted and the axes were then rotated orthog- 
onally using a simplified graphical method (4). Eighty-six additional 
pair-by-pair rotational adjustments were required. 


Results 


On the following pages the rotated factors described by Thurstone are 
compared with the factors after the additional rotational adjustments. 
Thurstone’s rotated factorial matrix (1, 115, 116) is not reproduced here, 
although loadings pertinent to comparison with the new results are listed for 
each factor. The new rotated factorial matrix is presented in Table 1. 

All but one of the seven original factors that were identified with assur- 
ance are directly comparable. Nevertheless, there are some significant and 
interesting differences between the two solutions. As a result of the new 
rotations, two of the factors previously identified at least tentatively were 
changed enough to warrant re-evaluation. Four other factors were added, 
since all four of the residuals were rotated to meaningful positions.* 


Definition of the Rotated Factors 
and Comparison of Results in the Two Solutions 


Spatial Relations. Thurstone’s first rotated factor was his factor S. 
It contained thirty-six projections in the nearly vanishing range of + .20 
and the lowest value was —.092. After the new rotations there were thirty- 
seven nearly vanishing entries, the lowest being —.14. The thirteen tests 
with loadings of .40 or more in at least one rotational solution follow. t 

In the revised solution, Flags was more clearly the best representative. 
Hands and Block Counting also gained significantly; while Pursuit and 
especially Form Board, Figure Classification, Verbal Classification, and 
Sound Grouping lost significant weight. Variance on the Form Board test 
transferred to a new Visualization factor; variance on Figure Classification 

*In a bi-factor analysis of the same data, Holzinger and Harman identified a general 
factor, seven group factors and two doublets. They did not isolate I and R, but they related 
six of the group factors and one of the doublets to the factors Thurstone identified. Their 
second doublet, ““Rhythm,” can be directly related to the ‘‘Classification” factor described 
in this report. The two highest loadings on their seventh group factor, ‘‘Analogies,’’ were 


only .27 and .17, respectively (5). 
{For each factor, the tests are listed in the order of their loadings in the revised solu- 


tions. 








WAYNE S. ZIMMERMAN 79 


SPATIAL RELATIONS 


Thurstone’s Revised 


Name of Test Solution Solution 
(20) Flags .636 dat 
(22) Lozenges B . 633 . 604 
(18) Cubes .626 .592 
(53) Hands .455 .547 
(17) Block Counting .413 .524 
(27) Pursuit .584 .513 
(23) Surface Development .651 .500 
(19) Lozenges A .448 .400 
(45) Syllogisms .430 .398 
(21) Form Board .415 317 
( 8) Figure Classification 393 222 
( 6) Verbal Classification .411 OVE 
(55) Sound Grouping .412 211 


to the Perceptual Speed factor; and variance on Sound Grouping, to another 
reasoning factor. Thus the readjustments tended to purify the factor, 
effecting a significant improvement in structure and psychological meaning- 
fulness.* 

Perceptual Speed. Axis II, after rotation, defined Thurstone’s Perceptual 
Speed factor. In his rotations thirty-seven nearly vanishing entries resulted, 
with a minimum value of —.072. In the new rotations, forty-two tests 
appeared in the + .20 range and the minimum value was —.128. 


PERCEPTUAL SPEED 


Thurstone’s Revised 


Name of Test Solution Solution 
(26) Identical Forms .603 .728 
( 6) Verbal Classification .537 .581 
(59) Word Count . 360 . 486 
( 7) Word Grouping .573 .376 
(51) Picture Recall .545 .341 
(11) Completion .422 311 
(14) Disarranged Sentences 461 . 300 
(60) Vocabulary .412 .291 
(44) Pattern Analogies .4385 .271 
(41) Verbal Analogies .417 Be i 


The revised rotations strengthened the loadings of Identical Forms 
primarily and Verbal Classifications secondarily on the Perceptual Speed 


*See description of Visualization factor and final paragraph under Eduction of 
Relationships factor. 





LILIA Li A AL a <3 AASs&R SSA asa <a 632 ee ee al 





(penuzquoey) T ATAV.L 

















"8g 199° FIZ 892° Leh 26° Tt0° 898 68° S00°'— 62% 980'— 240° 122° OLT’ Bur{doy “gz 
"26 89° 20° LT’ 691° 8° O10'— 22% Sze 200°'— 9IT'— SO'— O9T° FZI° SI’ qmsing *2Z 
‘(92 299° 280° I6I° eT’ IT 100° ZtO'— 680° 620'— FST’ 221° StO'— 8zZ° ZEST’ SULIO [BOTUIPT “9% 
‘ce 269° Ig0'— 968 TIIt’ 100° 828 fe SIT'— TOL’ 980° 981° 980°— 6&T° 22T° S}USWOAOT 
[eoruvyoeyy “SZ 
‘t% 82° 980°'— 219° 092° 820° 62° S6I° 62° +F80'— 98T° OFT’ Sk0° FO 992° sojoH poyouNg “FZ 
"€% 929° GOI’ +98 120° 9F 612° I80° + I90'— 68I° O10'— 020° StI’ 0g’ quowdojaaeq] 
a0B}ING “$% 
‘3 «ltl 20° 262° 601° 280°— FI1Z° 80° 228° 090° 08% g00'— 220° Z9I° +09" @ SeduezoT “ZZ 
‘[Z SI6° 861° 219° 968° 81 SFO S8I° g9T° GOT’ 86° ZIT’ 980°— 690° ZI pivog ulog “1% 
‘0 SI8° L480°'— 998° ZIZ° 82I'— FO'— #20° 98T° 180° $20° +#90° 161° 60° 222° sBepq “0% 
‘6 982° TOT’ 289° tIIZ° #1Il'— 681° €90° 862° ¢20'— 400° OT&' 290° 080° OOF’ V sosuez0T “61 
‘SI 192° 120° 862° 692° 820° 210° &% 621° 1% 600° 920'— 890° gee z6g° soqny “ST 
‘LT 629° G6I° 008 268 I6E° 220'— $0%° OOT’— Zgl'— 920° 120° 880° 980° ¥#F2¢° Buryunoy yoo “LT 
«a j9f TIL’ G&I° 090°'— €&%' 492° 993° #F10° 820° Zee LIZ 82h FET FE%° GFO'— SuIAUOUAY DANUAAUT “QT 
<M “St #29° 800° 610'— SIT’ O68 OOT’ 9F0° SZI° 2g¢° 802° &10'— F2I° 20° 000° sureiseuy “CT 
& ‘tI 22° OL9° 62I'— 68° 020° ZI’ 100'— 87° SSI’ z20Z TIZ 80° oog Fz saoua}UIG 
=I pesueiesiq “FT 
Oo ‘Sl #19 9F0'— 280° 62° GIF 82° 880°'— SSI’ Ste 220° ZLI° SOT’ 820° Tet’ ANT 4sSv'yY puv ysuy “ET 
= gl 19¢' 060° OF IZ ZST’ 980° 060° 240° 61° 902° 68% 901° 99T° StO0' =: SplOAA pOBuvBIIVSICG “ZT 
™ ‘IL $88° 061° 9ST’ 88% Ger ShO'— FSI" 820°'— ZZ 89% I#S T60'— IT@’ Tel’ uorjojduoy “TT 
* “O01 18° G8 SOI'— Z8T° OFF OTT’ ¢80° Zl’ 12% ZIZ 6S 180° g00'— ZeI° seysoddg aanuoauy ‘OT 
‘6 029° SI’ 68I°— 190° 999° 280° 220°'— 281° 200'— SII’ 222° 910° 80° ZIT'—- UOT} BIDOSSy 
pe[ozyueD ‘6 
8 899° *20'— 9FT° Lee’ 00° 92% G6IZ° Gch 8° O10'— TSI'— 900'— 92% 22Z' UOMNVoYIsse[H omnsry “g 
L LIL’ 28%° 100'— 9FF S&T OZT’ 280° OST’ ¢80° 790° 82h 960° 928° ¥F20° Burdnoiy plo *L 
9 298° 2% 390° SIZ get’ 260° 89% 228° ¢€80° 91° SIE +F00° I8G° IZ worwoyisselo peque, “9 
‘¢ €88° <IOT’ Ot GIT’ 20S Zl’ FFT’ OFf GIO'— Zze° 902° F10° OST’ g20°— II Surpeay “¢ 
b 1g2° 900° 900° 990° 222 961° Gf g0e° F90'— 29% 889 F90'— EcI° geo'— [ 3ulpwoy “F 
oY UN "A + HA ad ES) oS AT W A N d Ss qsay, jo oweNy 
wy 6 6s. X XI IIIA IIA IA A Al Ill II I 
o XLIPCIY [BLOzPIB [vuoZ0yWO pe}bzpOYY 
[ee) 


(UOTyNOY pestAcy) sisA[vuy sonIqy jeyueyy Areuwuug 
T ATaV.L 





81 


WAYNE S. ZIMMERMAN 











‘09 *80'T 220° 860 226 £O9T 200°'-— 69¢° TO” =. 988" cié° 949° 920° 162° 600° — Are[nqvo0A “09 
"69 cay Fee—(iT:C SO — Ot*“‘i—‘R SS CC‘ TO UOC SO — Se — OCiC(‘i‘t SC‘ ST yUNOD P1OM “6S 

"So Gs° Sit “806°. 260 — 19h S80'— fA — Git: 920° 290° $92" 280° OFt *60°— (oBvory) 
Are[nqvo0A “BG 
“22 (002° Sok. S60 — TUL Gel: Tie S8s" Toe" = STS" €60° O@b 080° TO: = FLt- TeuUIUTBI_) * LG 
‘9¢ 119° L6G 990°— 860° — 8h2° T0Z° 6& 890° €9F° 910°— S&F &F0° 290° 6S0° Buyjedg “9¢ 
‘gg 308° 88I° 980'— *9%2° OT10° OFO" 190° 89° 02° 900° O0€° tet. 36t° IZ" Zuidnoiry punog “gg 
‘tS 646° SIT’ 390° Se0°'— 286 OS0'— 860° €e2¢° 10Z° St0-Ct«CW CR — IO = wyytyy “FS 
“$S 029° 10”: OCLC SGT” SUE — Se = = 280°—Cti‘i CCST OcE° ZEl— S8i° OIl°'= Ze spuvy{ “Sg 
‘3G «804° bob’ SST’ HSL — ~«(6S0" Ist" 921° I8é°—s_ ¢g0" 6FT° cee CC SSO"Stié«=OT'CT—(‘aesé SVT. eueyT, “ZS 
‘Ig ec" T6¢°_- F10"—s xFST°—SCOTO" rS0°— 600° +60°— £20" oot’ = =€00° TZ0° Tvs" 910° [[eoey eInyoIg “TS 
‘0G 32c° = SFT” 046° 812° 80° — TIO €00° 221° 02° F1G° 89T° OOT'— Z80°— S20°'— UoTsUZOD0y oINSI “OG 
‘6h 88h 926° 988 260 O10° G2 S10 — GZ" Zia: 9€E" T9L° 080° 610° 110°—  UoryTUZ009Y PION *6F 
‘sp 902° ZIT” Tie — 620° = Gao — FOr TOe = =OsT’ tO" = S62" 00 — Sot Fel [Oe — JequN N-JeqUINN “SP 
‘Lr 899° &F0 920 280° Ize = SES [20 — "col SZ 82S" S82 810° 060° €60° — sperytuy LP 
‘OF 6&F° $80" €t0° +90°'— 9FZ LIS" #20" =LF0°'— 690° sig’ cs0 220° 6FT 660° ToqGUNN P1OM “OF 
“Se TL LIT Tél’ 98T° FE ET’ F8Z° 68h S980 68T° 92a" 103° 81" 86E" susiso[[Ag “op 
‘th ZI8° = F0a t1Z° «= 92S" SsisdGSTCtCé«ST” ly =—80°"——si SLT 020° 8h 060° 113° F80° solsopvuy U19}}8q “PP 
“Sh 8&8" T90° L065 OIF 620° — FOOT” 63" ccé&—ssxFIO’ — ~ I19E° gE 0SZ’ 04S" BET SPlOM 9POD “EF 
‘Or S6L c10° 320'— 920° 080° £629" 966° SFI 626° 002° T6e° = ¢00° 6ST" OZT* SeSMUsI, 8B “SP 
“Tr Zs" GEt SIO — coo" 16." I8T° 690° O82 &FO'— SST’ 69h OFT’ TZT” = 3GS3" solgopeuy [eqi9A “Th 
‘OF 8&8 980°— ZIT’ c8I° 600 — 809° 2G14e 961° LIZ LLY’ G9" TfO0°  ¢20°— 890° Suluosvoy * OF 

‘6 +98" 88° cet’ *F9T° 9 2Z8l° G9 Zeo0'— 400 — Zoo Z6I° 682° OF0'— F0E° Suruosvay 
[BOTJOUIGILIY “6E 

"SE F99° OG" €1g° $s80°— TI0° c00°— 09° TZO0’ ¢90° ¢20° 26I° StS" 660° €S0° quowspne 
[eouaUINN “Se 
“£6 SL GLE T80°'— *2@° 080° 968° Ler" «968° Ss STO — FET’ CEO" 59S" TFO° 990° Selleg JeqUNN “LE 
‘oe Gor li S80 -— Ste Zei-— Sle SLi" 201°— 2g0— 220° Ste Zi -— oso — 2 Suryeulysy “9€ 
‘c€ 0s9° 820° Til’ tI’ get" Ts." 16h" 893° 980° 92° 990° ZOF’ 220° 820° woNafdwoDH senqey, “eg 
‘t€ 622° 8FO0 S€0° 980°— 680° 10" 86h =O 6TT”—s SBT 610° FIT 8G" SLT’ 6LT° UOIsIAIgT “FE 
‘sé 808° OfS3 901° TrO'— 9IT° &60°— €@° cco’ 912° 280° 960°'— 692° Z2I0° Zor’ uoleoydynyy “ee 
“GE GEL 10° 200° 00° 820° g9T° S10” l406° 612° Gl40°'— 28° 699° 8s2t°— LOT’ UOTIBIFQNG “SE 
‘Te 989° 260° O80'— IIT’ 670° 28T° gF0° 1Z0° 60° €20° 080°'— *92° 690° Jz2T° uoryIppy “Is 
‘08 688° *t2I° cet’ SOF 290 £88I° gsgér° 000° 220° ce ZOt 619° 820° FZ" epoH JequNMN “OF 
‘66 329° €60° 093° 902° &ITt #01 €cS° 29% +FF0° 163° 880° $20° 820°— 802° Svoly "62 

zy UN 7A U AA da Ud +o) AT W A N d $8 48a], JO oWIBN 

Mix ix xi x bd | IIIA IIA IA A AI III II I 





(penuzquoyg) T ATIAVL 








82 PSYCHOMETRIKA 


factor. Word Count also gained, but most other tests lost weight in the 
exchange. Seven tests with loadings ranging from .417 to .573 dropped 
out of the significant class of .400 and above. Word Grouping, Pattern 
Analogies, and Verbal Analogies shifted weight to a new reasoning factor. 
Vocabulary and Completion contributed much of their former perceptual 
variance to the Verbal factor. Picture Recall and Disarranged Sentences 
gave up perceptual variance to a new factor (Memory for Observed Relation- 
ships) on which they appeared together. 

In describing the factor in which Identical Forms has the greatest 
saturation, Thurstone wrote, 


A hypothesis which agrees with introspective study of the mental operation essential in 
these tests is that the factor is essentially perceptual in character. Strictly speaking, all 
the tests in the battery involve perception and vision in particular . . The perceptual 
function here seems to be a facility in perceiving detail that is imbedded in irrelevant 


material (1, 80-81). 


The perceptual element in Picture Recall, Word Grouping, Disarranged 
Sentences, and in the Thorndike Vocabulary test was rationalized originally 
as due to a low level of item difficulty, emphasizing the speed element. In 
the new rotations, the same explanation seems less strained in view of the 
diminished loadings. 

Numerical. Axis III yielded another of Thurstone’s primary abilities. 
The original rotated loadings contained thirty-eight within the range of 
+.20, with a minimum of -.14. In the new solution, forty-five loadings 
ranged between +.20, the lowest being —.10. 


NUMERICAL 

Thurstone’s Revised 

Name of Test Solution Solution 
(33) Multiplication .812 . 769 
(31) Addition 755 . 764 
(32) Subtraction .670 .659 
(30) Number Code .625 .619 
(34) Division .619 . 584 
(35) Tabular completion . 392 .402 
(38) Numerical judgment .432 345 


The structure of the Numerical factor remained essentially unchanged. 
Apparently the simple functional operations of addition, multiplication, 
subtraction, and division are basic to this ability, with multiplication and 
addition representing it best. 

Verbal. Rotated Factor IV is Thurstone’s Verbal Relations. Thirty 
loadings fell within the +.20 range, the strongest negative being —.065. 
The new rotations yielded thirty-two “negligible” values and a minimum 








WAYNE S. ZIMMERMAN 83 


of —.137. Although in both rotational solutions the axis was projected 
through the large cluster of verbal tests, its direction differed significantly. 


VERBAL 
Thurstone’s Revised 
Name of Test Solution Solution 
(58) Vocabulary (Chicago) .395 . 763 
( 5) Reading II .506 . 706 
(60) Vocabulary .385 .676 
( 4) Reading I .552 . 638 
(10) Inventive Opposites .639 .549 
(11) Completion i300 541 
(16) Inventive Synonyms .495 .478 
( 7) Word Grouping . 456 .478 
(40) Reasoning .420 .465 
(41) Verbal Analogies .597 .459 
(52) Theme .3007 435 
(56) Spelling .386 . 433 
(57) Grammar .498 .420 
(42) False Premises .424 .391 
(55) Sound Grouping . .453 .300 
( 9) Controlled Association .450 <222 
(14) Disarranged Sentences .395 ott 


Inventive Opposites is highest in Thurstone’s solution, while the new 
location of the axis places four other tests, Vocabulary (Chicago), Reading 
II, Vocabulary, and Reading I at the top of the list. The first three and 
also Completion gained considerable weight at the expense of the Restrictive 
Reasoning factor. The two reading tests picked up significant weight from 
Residual Factor 12. Vocabulary and Completion drew also from Perceptual 
Speed, while Inventive Opposites, Verbal Analogies, Sound Grouping, Con- 
trolled Association, and Disarranged Sentences transmitted weight to new 
factors. The major portion of the transferred variance of Inventive Opposites 
and Controlled Association went to a new verbal fluency factor; of Verbal 
Analogies to a new relationship factor; of Sound Grouping to a new classi- 
fication factor; and of Disarranged Sentences to a new memory factor. 

In his discussion of the Verbal Relations factor, Thurstone wrote, “In 
all these tests the subject must deal with ideas, and the factor is evidently 
characterized primarily by its reference to ideas and the meaning of words.” 
(1, 84). With the vocabulary tests now heading the list, even greater stress 
is placed upon the knowledge of meanings of words. At the same time the 
factor is more clean-cut and univocal. : 

Rote Memory. Factor M (Memory) originally had thirty-seven entries 
in the +.20 range and a minimum of —.080. After the additional rotations, 
thirty-five values were between +.20, with a minimum of —.116. 








84 PSYCHOMETRIKA 


MEMORY 
Thurstone’s Revised 
Name of Test Solution Solution 
(48) Number-Number .664 . 709 
(47) Initials .487 .528 
(46) Word-Number .529 .518 
(50) Figure Recognition .420 .514 


In the second solution Number-Number continued to represent the factor 
best, extending its projection from .664 to .709. Initials and Figure Recog- 
nition also gained as a result of the new rotations. All four of the leading 
tests were designed as memory tests, and the first three are in paired-associ- 
ates form. Whatever the emphasis, ‘“There seems to be no doubt that this 
factor is concerned with memory.” (1, 85). Of three memory factors listed 
by the aviation psychologists, one identified as ‘‘Paired-Associates Memory” 
compares closely with this factor (3, 823, 826). 

Letter Fluency. Thurstone’s rotations for factor W, like the new rota- 
tions, yielded thirty-nine values within the range of +.20, with a minimum 
of —.13. 

LETTER FLUENCY 
Thurstone’s Revised 


Name of Test Solution Solution 
(15) Anagrams .5384 .552 
(12) Disarranged Words .612 .519 
(57) Grammar .530 .518 
(56) Spelling .508 . 463 
(13) First and Last Letter 388 .448 
(60) Vocabulary .413 . 386 


Anagrams, Disarranged Words, Grammar, and Spelling continued with 
the highest loadings. Projections for Spelling were shortened and for First 
and Last Letter lengthened somewhat, but the differences were slight. 

All of the tests deal with verbal materials, but stress is placed on famili- 
arity with word structure rather than their meanings. Thurstone wrote, 
“The factor W seems to have as its principal characteristic a fluency in 
dealing with words. This factor seems to be separate from the verbal factor 
V, which is concerned with ideas and meanings.” (3, 85). 

The term “Word Fluency” now seems unfortunate insofar as it implies 
general verbal fluency, or ease of evoking words appropriate in meaning. 
A new factor, revealed in the revised solution, fits this description better. 
“Fluency with letters’ seems more descriptive of factor W, implying easy 
recognition and manipulation of letter patterns. Apparently some individuals 
are able to recall words fluently or to construct them readily without neces- 
sarily understanding their meanings very well. 

Classification. The second of the two factors that Thurstone identified 
tentatively he denoted I (Induction). There were thirty-six tests within 








WAYNE 8S. ZIMMERMAN 85 


the range of +.20, and the minimum value was —.110. In the new rotations, 
thirty-three projections remained within the diminishing range, and the 
greatest negative projection was —.088; but the complexion of the factor 
changed altogether. 


CLASSIFICATION 

Thurstone’s Revised 

Name of Test Solution Solution 
(55) Sound Grouping . 285 635 
(54) Rhythm .319 .573 
( 8) Figure Classification .405 .455 
(45) Syllogisms .325 .429 
(37) Number Series .503 .396 
(35) Tabular Completion .479 . 263 
(29) Areas .477 . 262 


Originally, Number Series, Tabular Completion, and Areas were most 
heavily weighted. In the revised solution most of the variance of these 
tests transferred to the factor called General Reasoning. Sound Grouping 
and Rhythm picked up substantial weight, mostly from the Spatial and 
Verbal Relations factors. Figure Classification and Syllogisms showed lesser 
gains. The final position of the axis was changed so significantly that test 
loadings in the two solutions are not directly comparable. 

Since the two leading tests in the revised solution represent an attempt 
to introduce material of auditory significance in paper-pencil form, the 
factor might be interpreted as an ability to respond to auditory stimuli. 
Such a conclusion would not explain the significant loadings of Figure Classi- 
fication and Syllogisms, however. The tests do seem to have one common 
feature, since in each a dichotomous classification has to be made rapidly. 
In Sound Grouping, groups of four words are presented; three sound alike, 
and one sounds different. The task is to discriminate between the two kinds 
of words. In Rhythm, each item includes four verses, three alike in rhythm 
and one different. The examinee selects the one that is different. Figure 
Classification presents a similar problem involving an entirely different 
medium. Each item consists of eight unclassified symbols, plus two groups 
of four related symbols. The task is to classify each of the symbols according 
to the dichotomy represented. The Syllogisms test has a fairly significant 
loading which is more difficult to explain. It is suggested that solving these 
problems rapidly is facilitated by an ability to think of the relationships 
dichotomously—e.g., considering a pair of individuals at a time, classifying 
one as younger and one as older. The tentative label “Classification” has 
been applied to this new factor. It is of interest here to call attention to a 
doublet factor isolated by Holzinger and Harman which they named 
“Rhythm.” Rhythm and Sound Grouping, both with loadings of .53, were 
the only significantly weighted tests (4). , 








86 PSYCHOMETRIKA 


General Reasoning. Thurstone’s factor R (Restrictive Reasoning) had 
twenty-eight column entries in the range of +.20 with a minimum of —.101. 
In the new solution, the factor with which R corresponded most closely 
contained thirty-three diminishing values and a maximum negative projec- 
tion of —.088. 

GENERAL REASONING 
Thurstone’s Revised 


Name of Test Solution Solution 
(39) Arithmetical Reasoning . 583 .642 
(38) Numerical Judgment . 5384 .604 
(29) Areas . 295 .523 
(34) Division .3852 .498 
(35) Tabular Completion . 180 .491 
(37) Number Series .091 .437 
(44) Pattern Analogies .341 .411 
(60) Vocabulary .545 .359 
(25) Mechanical Movements .414 343 
(56) Spelling .410 . 239 
(11) Completion .481 .154 
(58) Vocabulary (Chicago) .457 — .026 


With their projections extended significantly, Arithmetic Reasoning and 
Numerical Judgment head the new list. Other very significant changes 
occurred. Areas, Division, Tabular Completion, and Number Series made 
striking gains, picking up variance from Thurstone’s Factor I. On the other 
hand, the verbal tests, Vocabulary (Thorndike), Vocabulary (Chicago) 
Completion, and Spelling transferred large portions of their variance to the 
Verbal factor. 

Thurstone denoted this factor FR, for restriction. ‘The common charac- 
teristic seems to be the successful completion of a task that involves some 
form of restriction in the solution.” (3, 88). Classifying the factor presented 
difficulties. ‘“The characteristic that is common to these tests, and to a lesser 
degree in the tests with projections between .30 and .40, is not easy to de- 
termine.”’ (3, 88). It now seems quite possible that much of the difficulty 
may have been that reasoning and verbal tests were clustered together. 

The new rotations make a clean-cut interpretation possible. Since 
the verbal element transferred to the verbal factor, the reasoning tests are 
left alone to represent the factor which now corresponds quite directly to 
the Army Air Force’s “General Reasoning.” Most tests remaining in the 
cluster are of a numerical nature, but their pure numerical variance is 
accounted for already in the Numerical factor. Pattern Analogies contains 
non-numerical problems and Areas also is largely non-numerical. 

Deduction. The second factor that Thurstone only tentatively iden- 
tified was designated D, for deduction. Column D had thirty-three entries 
within the range of +.20, and a minimum of —.097. In the second set of 
rotations eight additional tests entered the diminishing range, and the 








WAYNE S. ZIMMERMAN 87 


minimum was —.093. Again, the loadings of the identifying tests tended 
to increase while those of tests with lower loadings diminished, thus im- 
proving the structure. 


DEDUCTION 

Thurstone’s Revised 

Name of Test Solution Solution 
(42) False Premises .578 .629 
(40) Reasoning /525 .608 
(387) Number Series . 287 .396 
(25) Mechanical Movements .403 .328 
( 8) Figure Classification .3898 . 226 


The values for False Premises, Reasoning, and Number Series increased 
significantly, drawing variance from scattered sources. Both Mechanical 
Movements and Figure Classification transferred variance to the new Memory 
for Observed Relationships factor, while the former also lent variance to 
Visualization. Although “Its obvious common feature is the deductive 
nature of the four tests” (1, 88), Thurstone lacked confidence in the interpre- 
tation of Factor D because of the small number of significant loadings. The 
new rotations did not help in this latter regard, since they tended to isolate 
further False Premises and Reasoning, giving more of a doublet appearance 
to the factor. Since both of these tests consist of syllogistic reasoning 
problems, the new solution supports the identification of the factor as de- 
duction. 

Verbal Fluency. Column 10 in the original rotated matrix was said to 
represent a residual factor, even though it contained thirty-five values 
within the +.20 range with a minimum value of —.077, and five values above 
.40, the maximum being .501. The new rotations added three diminishing, 
values, with a minimum value of —.137. 


VERBAL FLUENCY 


Thurstone’s Revised 


Name of Test Solution Solution 
( 9) Controlled Association .480 . 666 
(58) Vocabulary (Chicago) . 292 .461 
(10) Inventive Opposites .316 .440 
(11) Completion .428 .439 
(13) First and Last Letter .486 .419 
(15) Anagrams .378 .390 
(21) Form Board .501 .318 
(25) Mechanical Movements .402 .201 


The new set of rotations made it possible to apply a meaningful interpre- 
tation. Controlled Association extended its projection from .480 to .666, 
borrowing the larger portion of this increased variance from the Verbal 
Relations factor. The values for the Chicago Vocabulary test and Inventive 








88 PSYCHOMETRIKA 


Opposites also increased significantly. The two tests that undoubtedly 
caused difficulty in the identification of the original factor, Form Board and 
Mechanical Movements, are dropped from the significant class, with the 
major portion of their variance reallocated to the new Visualization factor. 

The task in Controlled Association is to write as many words as possible 
that are similar in meaning to a given stimulus word that has a highly general 
meaning allowing for a multiplicity of responses. The comparatively high 
loading of the Chicago Vocabulary test may be due to the fact that the 
stimulus word is presented in a short phrase, furnishing cues to its meaning 
from the subject matter. Fluent recall of trial words to fit the phrase should 
aid the subject to elicit correct responses. In Inventive Opposites the task 
is to supply two antonyms for a given test word. The Completion test 
emphasizes fluent verbal recall as opposed to the verbal recognition tapped 
by the five-choice Vocabulary tests. In First and Last Letters the subject 
must write as many words as he can beginning with one given letter and 
ending with another, while Anagrams requires the examinee to make as 
many words as possible, using only the letters of a given word. The heavily 
saturated tests now remaining apparently contain a single common element— 
fluency in dealing with words and sentences (in contrast to the original 
fluency factor, which has here been reinterpreted as fluency in dealing with 
letters). It is interesting to note that the last two tests named have more 
significant loadings on the Letter Fluency factor than they have on Verbal 
Fluency. 

Eduction of Relationships. Axis 11 was not rotated to positive mani- 
fold by Thurstone, although it apparently was used in some of the rotations, 
since there was more positive than negative weight represented. Thirty- 
three entries were within the +.20 range, negative values as high as —.39 
were present, and three values exceeded .40. The new rotations yielded 
thirty-one loadings in the near-zero range with a minimum of —.124, and 
extended eight projections beyond .40. 


EDUCTION OF RELATIONSHIPS 
Thurstone’s Revised 


Name of Test Solution Solution 
(41) Verbal Analogies .313 .592 
(44) Pattern Analogies .217 .526 
( 8) Figure Classification . 269 447 
( 7) Word Grouping .087 .446 
(28) Copying . 248 .437 
(43) Code Words . 326 .416 
(25) Mechanical Movements . 286 .411 
(30) Number Code .448 .405 
(21) Form Board .281 .396 
(20) Flags .434 212 


(53) Hands .467 .198 








WAYNE 8S. ZIMMERMAN 89 


Originally, Hands, Number Code, and Flags, were the only tests with 
significant loadings. While one of these, Number Code, retained its position 
in the above-forty class, Flags and Hands had their loadings reduced to a 
near negligible level, with most of their variance transferring to the Spatial 
factor. In the meantime seven tests passed Number Code, gaining weight 
essentially as follows: Pattern Analogies and Word Grouping, from the 
Perceptual Speed factor; Verbal Analogies, from the Perceptual Speed and 
Verbal Relations factors; Mechanical Movements, from the Residual factors 
10 and 12; and Code Words and Copying, from scattered sources. 

This factor apparently transcends the nature of the subject matter, 
with two verbal and two non-verbal tests in the leading roles. Verbal 
Analogies and Pattern Analogies have a similar format. Both tests require 
the subject first to determine the relationship between two stimuli, then 
to make a choice of a stimulus that has the same relationship to a third 
given stimulus. The first test uses words and the latter geometric figures. 
Word Grouping and Figure Classification also test an ability to see relation- 
ships among given stimuli. Again, one involves words and the other geometric 
figures. - The significant loading of Copying is of interest because, super- 
ficially, the test appears to be entirely different from the others in the group; 
yet, to solve the problems it is necessary to note relationships between a 
given pattern and one to be copied. Code Words requires the determination 
of English equivalents for words written in code. Problems are solved by 
noting relationships among symbols that correspond to the relationships 
among letters in words. Mechanical Movements is factorially complex, 
but the ability to note recurring relationships, such as the fact that two 
helical gears in mesh turn in opposite directions, is undoubtedly helpful. 
Number Code items require an examinee to perform calculations using a 
combination of Arabic numbers and code numbers. The relationships between 
the two types of numbers have to be determined before the problems can be 
solved. In Form Board the problem is to discover the relationship between 
a group of small geometric figures and a larger figure that the small ones can 
be arranged to fit. 

It is concluded that the ability to educe relationships is at the core 
of this factor. On few other factors has such a heterogeneous group of tests 
been held together by a single common tie. It is pertinent to note at this 
point that the first series of rotations leading to the new solution tended to 
isolate Flags and Hands on Axis 11, thus corroborating the AAF evidence of 
the existence of AAF Space 2 (3, 835, 839), and Thurstone’s Factor K, 
represented by Hands and Bolts (5). It became apparent, however, that 
continued rotations in this direction would not permit some large negative 
values to reduce satisfactorily. As a trial venture the poles of the axis were 
reversed, and a much more satisfactory solution was achieved. The con- 
figuration suggested strongly, however, that the extraction of another centroid 








90 PSYCHOMETRIKA 


might well allow this second space factor to reveal itself and that much of 
this variance would be drawn from the present Space factor. Thus, the 
Space factor described in this article should probably be regarded, at least 
provisionally, as a composite factor. (10) 

Visualization. Thurstone rotated Axis XII to positive manifold but 
did not interpret it. There were forty loadings in the +.20 range with a 
minimum value of —.096, and five loadings >.40. The new set of rotations 
retained thirty-nine entries in the diminishing range, with a minimum value 
of —.129, and produced four >.40. 


VISUALIZATION 


Thurstone’s Revised 


Name of Test Solution Solution 
(21) Form Board .397 O17 
(24) Punched Holes Sat .617 
(19) Lozenges A .530 537 
(25) Mechanical Movements .142 .396 
(58) Vocabulary (Chicago) .045 . 308 
( 5) Reading IT .502 . 162 


The reason that Thurstone did not assign a unitary label to this factor 
may have been that the two verbal tests, Chicago Vocabulary, and Reading 
II, were loaded so heavily along with the three non-verbal tests, Lozenges A, 
Punched Holes, and Form Board. 

The additional rotations diverted a large proportion of the variance 
of the verbal tests to the verbal factor. Meanwhile, Form Board and Punched 
Holes increased their loadings strikingly. Mechanical Movements also 
gained substantially. Most of these increases in variance were due to rota- 
tions with Residual Axis 10 and the Restrictive Reasoning factor. 

In the Form Board test, the examinee is required to visualize how the 
separated parts of a geometric figure can be rearranged to fit the figure. 
In Punched Holes he must visualize the folding of a sheet of paper and the 
cutting of holes in the folded piece, and then recognize how the paper would 
look unfolded. For Lozenges A, the examinee must visualize a diamond- 
shaped card with a hole in one corner being picked up and turned over, 
then placed with the prescribed edge downward. In Mechanical Move- 
ments visualization is probably an aid to the individual in setting the ma- 
chinery in motion mentally. 

Thurstone’s description of spatial factor S, “facility in spatial and visual 
imagery” (1, 80) seems to describe this new factor even better than it does 
factor S. Thus, if the visualization label is adopted for the new factor, 


re-evaluation of the spatial factor is indicated. 








WAYNE S. ZIMMERMAN 91 


The following are offered as hypotheses to distinguish between the 
spatial and visualizations factors.* 

The first is that some problems can be solved either spatially or by 
visualizing. When only a slight degree of turning or rotating is required 
for an individual to orient himself with an external object, he is more likely 
either to move himself or to feel himself adjust empathically, perhaps kines- 
thetically, to the stimulus situation. If a greater degree of adjustment is 
required, however, it might be beyond the individual’s power to emphasize. 
In order to bring himself and the object into alignment, he would have to 
form a mental image of the object and then manipulate it into position. 
For example, consider an individual looking at a house whose foundation is 
askew. If, in orienting himself, he tends to line himself up with the house, 
he is “‘spatializing.”’ If, on the other hand, he imagines the house rotated 
back into an upright position, he is ‘‘visualizing.”’ 

The second hypothesis, following from the first, is that space and visuali- 
zation lie on a difficulty continuum. If visual stimuli must be rotated into 
new positions, the simpler problems can be solved by spatial empathy, 
whereas more difficult items involving several turns or rotations would evoke 
visualization. At the extremes of the continuum the easiest items would 
emphasize Perceptual Speed while the most difficult items would demand 
Reasoning. Thus, by merely varying difficulty or complexity, the same 
type of item could be used to measure four (or more) factors. This hypo- 
thesis was put to test and the results are described in another paper (7). 

The third hypothesis is that the spatial aspect of the spatial-visualiza- 
tion processes is the mere determination of the direction of action, whether it 
is left or right, up or down, forward or back. In other words, it involves a 
directional discrimination and choice reaction. 

Memory for Observed Relationships. Axis XIII apparently has not 
entered into any of Thurstone’s rotations. Fifty-one tests had loadings 
between +.20, and none was greater than .274. 

*In a series of analyses of mechanical aptitudes, Thurstone has recently made compar- 
abte distinctions (6). His S; resembles the well-known and better established Space factor, 
while S, appears to be a composite of Visualization and the AAF Mechanical Experience 
factor. His distinction of S, as visualization of rigid configuration moved into different 
positions, and S, as visualization of movements of parts within the configuration, and the 
distinctions offered in the first and second hypotheses cited above, would seem to represent 
an effort to distinguish between the same basic phenomena. It is here argued that the 
internal-external and rigid-flexible distinctions are functions of, and are merely incidental 
to, differences in difficulty and complexity. (See also final paragraph under Eduction of 
Relationships factor. 

tDegan’s use of N. Wiener’s term ‘cybernetics’ as “the facility of discriminatory 
decision in which the decision relates to the potential direction of movement” would seem 
appropriate here (8). A block of paper-pencil tests designed with a similar hypothesis in 
mind did not appear on the Spatial (Orientation) factor in Roff’s analysis of the Keesler 
Field Battery (9). Instead they helped define two other factors (Complex Perception and 
Complex Reaction) and the variance on Complex Coordination, which headed Degan’s 
Cybernetics factor and which appeared consistently on AAF Space I, was confined to the 
Psychomotor factor. Obviously clear-cut distinctions among all the factors in the Spatial- 
Visualization domain must await still further study. 








92 PSYCHOMETRIKA 


The attempt was made in the new solution to achieve a psychologically 
meaningful position for this axis, and in doing so the requirements of simple 
structure and positive manifold were met fairly well. Forty-two entries 
remained in the diminishing range of +.20, while three values were raised 
to above .40, two of them approaching .60. 


MEMORY FOR OBSERVED RELATIONSHIPS 


Thurstone’s Revised 


Name of Test Solution Solution 
(51) Picture Recall .176 .591 
(14) Disarranged Sentences . 204 .570 
(52) Theme 172 .424 


The test now leading the list picked up variance from a number of 
sources, but primarily from Perceptual Speed, and secondarily from Number 
and Memory. Disarranged Sentences also increased its variance considerably, 
borrowing chiefly from Verbal Relations, Numerical Operations, Memory, 
and Word Fluency. Theme gained its weighting primarily from residual 
Axes 11 and 12 and secondarily from Restrictive Reasoning, Word Fluency, 
and Memory. 

Interpreting this new factor is difficult, and only a tentative name 
has been adopted for it. In Picture Recall a picture is studied for a limited 
period, after which questions must be answered regarding various details. 
On the basis of this test alone there is a strong suggestion of visual-memory— 
a welcome conclusion since a visual-memory factor has been identified in a 
number of previous studies, especially those performed by aviation psychol- 
ogists. Accounting for Disarranged Sentences, however, requires some 
explanation. The examinee has to rearrange words into their correct order 
before he can answer the questions asked. Visual memory might be an aid, 
since the examinee with a strong ability to recall visually the construction of 
a sentence might be better equipped to set the words into their proper places. 
It is also possible to rationalize the presence of Theme, which has the third 
highest loading on the factor. The subject is asked to write about an ac- 
quaintance just as he might discuss him with a friend. Often enough this 
task involves description of physical characteristics as well as physical activi- 
ties where the ability to recall a visual picture should aid the writer. 

However reasonable the foregoing explanations of the visual-memory 
element in Disarranged Sentences and Theme may seem, the argument is 
weakened by the absence of other tests which should appear by the same 
token. Disarranged Words, for example, seemingly should involve the 
same processes, but its loading is only .09. If pure visual memory were the 
key the ability should be equally useful in reconstructing either sentences or 
words. The disparate factor pattern suggests that some element of meaning- 








WAYNE S. ZIMMERMAN 93 


fulness is probably important in the former task that does not enter into the 
latter. 

In speculating regarding the possible breakdown that might be achieved 
in the area of memory through the application of factor methods, it may 
be recalled that Thurstone raised the question of whether there might be 
a factorial distinction between rote memory and memory for ideas (1, 86). 
Possibly this new factor might stress the memory for ideas as opposed to 
the rote memory measured by factor M. 

It is entirely possible that those who made good scores on the Picture 
Recall test did so by applying what might be termed “logical memory”’ (1, 86); 
that is, they observed the various relationships in the picture, perhaps 
verbalizing them. As observing individuals, they would habitually look 
for related elements, tying them together in a logical pattern as an aid to 
memory. Their ability to recall the proper order of words in sentences would 
be due to having concentrated upon word relationships. This ability would 
show up in Theme since observant people would be better able to recall 
accurately various characteristics of friends and acquaintances. 

This latter ability seems to be a more nearly unique feature of the three 
tests than the former, although it is always possible, of course, that other 
variance may be represented also. The tentative term ‘“‘Memory for Observed 
Relationships” has been adopted awaiting clarification from other studies. 


REFERENCES 


1. Thurstone, L. L. Primary mental abilities. Psychometric Monogr., No. 1. Chicago: 
Univ. Chicago Press, 1938. 

2. Zimmerman, W. 8. Visualization tests. In J. P. Guilford (Ed.), Printed classification 
tests. Army Air Forces, Aviation Psychology Program Research Reports, Report No. 5. 
Washington: U. 8. Govt. Printing Office, 1947. 

3. Guilford, J. P. Factorial picture of tests and criteria. In J. P. Guilford (Ed.), Printed 
Classification Tests. Army Air Forces, Aviation Psychology Program Research Reports, 
Report No. 5, Washington: U. 8. Govt. Printing Office, 1947. 

4. Zimmerman, W.S. A simple graphical method for orthogonal rotation of axes. Psycho- 
metrika, 1946, 11, 51-56. 

5. Holzinger, K. J., and Harman, H. H. Comparison of two factor analyses. Psycho- 
metrika, 1938, 3, 45-60. 

6. Thurstone, L. L. An analysis of mechanical aptitudes. Psychometric Laboratory 
Report, No. 62, University of Chicago, 1951. 

7. Zimmerman, W.S8. The influence of changes in item complexity upon the factor com- 
position of a spatial-visualization test. Educ. psychol. Measmt., in press. 

8. Degan, J. W. A reanalysis of the Army Air Force battery of mechanical tests. Psychol- 
metric Laboratory Report, No. 58. University of Chicago: 1950. 

9. Roff, M. F. Personnel selection and classification procedures: Perceptual tests, a factor 
analysis. School of Aviation Medicine, Project Report, April, 1950. 

10. Zimmerman, W.S. A note on the recognition and interpretation of composite factors. 
Psychol. Bull., in press. 


Manuscript received 6/26/52 


Revised manuscript received 8/23/52 








BOOK REVIEW 


GERHARD TINTNER. Econometrics. John Wiley and Sons, N.Y., 1952, pp xiii + 370, $5.75 


Gerhard Tintner, fellow of the Econometric Society, the Institute of Mathematical 
Statistics, and the American Statistical Association, has completed the useful task of 
providing the first textbook on modern methods of econometrics. Other volumes in the 
field are not competitive with Professor Tintner’s book, in that they do not cover the 
methodological contributions of the past decade. 

Econometrics deals with a general discussion and illustration of the subject, the 
application of multivariate statistical methods to economic data, econometric model 
construction, and a study of time series analysis. In an appendix the author gives a brief 
discussion of matrices, determinants, and computational methods. Numerous examples 
are given throughout to illustrate the techniques developed. The subject matter is 
mathematical in nature and kept on a high plane without being made oppressively rigorous. 

Two outstanding contributions of the book are the discussions of multivariate 
statistical methods and certain aspects of time series analysis. In the section on multi- 
variate methods, Tintner introduces the reader to such topics as multiple regression, 
discriminant analysis, principal components, canonical correlation, and weighted re- 
gression. Economics students are not especially familiar with these methods and will 
find Tintner’s exposition useful. Unfortunately, he did not choose a set of illustrative 
examples that will simultaneously be instructive to other social scientists. There is great 
need for the application of multivariate methods to the analysis of survey data involving 
large samples of individual respondents and numerous personal variables. Had he chosen 
examples from economic data collected in surveys, he would simultaneously have struck 
a note appealing to psychologists and sociologists. 

A substantial portion of the section on time series analysis covers quite seanventiaia 
material on the measurement of trend and of seasonal and cyclical variation in economic 
activity. These matters are discussed in most elementary texts on economic and business 
statistics, although in a less mathematical form. Well-known mathematical techniques 
of fitting orthogonal polynomials, logistic curves, and Fourier series; smoothing by moving 
averages; and periodogram analysis are included. The treatment of serial correlation, 
stochastic difference equations, autoregressive schemes, and correlogram analysis is more 
interesting and less familiar. 

Cleavages exist among econometricians, and Tintner’s approach to the subject is 
one that fails to capture what the reviewer regards as the singular contribution of econo- 
metrics to methods of social science research. A feature of econometric methods not found 
in psychometrics or empirical studies in other social sciences is the systematic blending 
of a priort information and empirical observation. Social science investigations often 
proceed by purely empirical methods of reasoning. Data are searched for regularities 
and high correlations. The alternative approach of some econometricians is to use a priori 
information such as economic theory, institutional practices, legal restrictions, and tech- 
nological information to fashion a mathematical model of the economy. It is in the use 
of the economic theory of behavior to formulate testable hypotheses that other social 
scientists could possibly derive some benefit from a study of econometric methods. The 
a priori information of all types serves to define the class of variables being considered 
and many specifications about the mathematical form of the relationships used. The 
latter specifications are, however, seldom complete; hence simple functional forms are 
widely used to expedite computational and other analytical efforts. Econometric models 


95 





96 BOOK REVIEW 


constructed on the basis of a priori reasoning are then confronted with statistical observa- 
tions. The structural characteristics, the parameters, of the model are estimated from 
the data and identified with basic economic concepts. 

Another outstanding characteristic of modern econometrics is that the stochastic 
properties of models are explicitly developed at the outset of analysis. Tintner fully 
presents this aspect but gives inadequate attention to the choice between two main alter- 
native stochastic models. One model assumes that individual variables are subject to 
error, say measurement error, while another assumes that behavior is subject to error, 
say through the neglect of explicit treatment of minutiae, rare events, and nonmeasurable 
quantities. Tintner implicitly tells the reader that both errors in variables and errors in 
equations are present, that there are inherent statistical difficulties in using a stochastic 
model based on both types of error, and that therefore we must arbitrarily assume one 
model or the other. His preference visibly is for the error-in-variable model. He fails 
to emphasize for the reader that it is, in principle, possible to obtain accurate measurements, 
that we are moving in the direction of better and better statistical measurement of economic 
data; and that in systems involving large numbers of individuals making free choices, 
behavioral disturbances are inevitable. It is virtually inconceivable to imagine social 
behavior of individuals that could be described completely by a set of measurable variables 
that the human mind of an investigator can simultaneously manipulate. The reviewer 
has a distinct preference for models whose stochastic structure permits explicit disturbance 
of behavior (errors in equations) and feels that other social science studies should use a 
similar probability scheme. Tintner devotes a chapter to rather formal calculations 
with errors-in-equation models. In this respect his book is inadequate. 

Tintner makes a happy use of examples to illustrate his methods, and this, in itself, 
adds greatly to the pedagogical contribution. The examples are not, however, well chosen 
to bring out the best of econometrics. The reader may get the impression that the subject 
is not to be taken too seriously, because Tintner frequently summarizes the results of 
an example by warning the reader to accept the findings only with the greatest of caution 
due to the fact that a number of assumptions are probably not fulfilled. There are actual 
empirical studies which are to be taken seriously and in which careful programming attempts 
to fulfill the underlying assumptions. Tintner’s attitude is overly negativistic, but he 
could have made a better choice of examples by selecting those yielding results in which 
he could have some faith and about which he would not have to be apologetic. A smaller 
number of examples, elaborate enough and penetrating enough to show what econometrics 
can truly accomplish, would have been preferable. Students may wonder, after having 
worked through Tintner’s text, what would be an acceptable econometric investigation. 
Is a subject mature enough to warrant a textbook if the accomplishments are no less 
subject to criticism than Tintner’s examples? Surely Tintner cannot feel that empirical 
econometric studies are as weak as he leads one to believe his examples are; otherwise 
he would be in another profession. 

Tintner has so many examples that he is forced to give each only a superficial analysis. 
Some of his numerical findings appear anomalous, and he gives off-hand explanations; 
whereas the reviewer would offer quite different explanations. This suggests that these 
examples need further econometric treatment than is given them in the book. 

Tintner’s style is not pleasing, in that his pages are cluttered with far too many 
references. Some of them seem to be purely superfluous or irrelevant. In a textbook it 
is less necessary than is ordinarily the case to give credit for independent research results 
on common subjects. An annoying feature is the occurrence of numerous misprints: some 
in equations, some in literary text, and some among the many references. 


University of Michigan L. R. Klein 

















